I've been doing a lot of thinking lately about globals and their place in
MediaWiki in the long term. I rewrote globals.txt to reflect the fact that
PHP does not love globals, in fact the need for a declaration to bring
globals into the local scope puts it among the more global-hostile languages.
In many cases, use of globals obscures data flow and makes classes less
flexible, inhibiting reuse. This is patently true in the case of $wgTitle
and $wgArticle, the existence of which encourage lazy programmers to write
code which fails in the common case where more than one of these objects
exist. At present, these two objects are almost exclusively used in the
output phase, so it would make sense to make them members of OutputPage or
Skin instead of globals.
The most extreme anti-global architecture would be one involving application
objects:
$mw = new MediaWiki;
$mw->executeWebRequest();
The application object could theoretically be passed to most class
constructors, providing a form of global context. That, however, would make
writing new classes a bit tedious. In my experience, it turns out to be
easier to make the application object a global, and pull it in wherever it
is needed. This would have advantages when MediaWiki needs to be embedded as
a library, since it keeps the global scope cleaner, but it's not really more
flexible than what we're doing now.
After some thinking, I was forced to admit that there are some cases where
globals make sense, from a data flow perspective. The clearest example is
caching. A cache should have the widest possible scope. If you have two
application objects, you would want them to share the same caches if
possible. Indeed, it's better if different threads, processes and even
servers can share their caches.
There are, however, disadvantages to using global variables for this or any
other similar purpose. The problem is that the use of global variables
inhibit lazy initialisation. The familiar solution is to use an accessor
function, and indeed this approach has already been implemented in several
places in MediaWiki. I would like to make such accessor functions more
pervasive.
There is also the problem that the global namespace is somewhat crowded.
Using a global function for an accessor just moves this problem to somewhere
else. The alternative is to use a static class member as an accessor. This
concept is well known, and where the static object is the only one ever
needed, the object is called a singleton. The PHP 5 manual recommends
calling the accessor function singleton(), and I'll go along with that
despite personally preferring getInstance().
The disadvantage to the singleton pattern is that it requires the class name
to be hard-coded throughout the code base, removing some flexibility. We
could get around that by having base classes construct derived classes, if
you don't mind the dependency implications.
I'm currently working on converting $wgLinkCache to a singleton pattern, and
I also have a few other objects in my sights. But I still don't know exactly
how far we want to go with this. What do we want our long-term architecture
to be?
What should we do with the User class? $wgUser is used very heavily. If not
global, the scope of the object would have to be very wide. There are a few
applications for multiple user objects, but they don't really interfere with
the use of $wgUser elsewhere.
Another tricky case is configuration. There's about 300
configuration-related globals, it might be nice to encapsulate them purely
from a namespace perspective. We already have a SiteConfiguration object,
and on Wikimedia sites, this object has a configuration array which is
extracted into the global namespace. Should we just use it directly instead?
The conversion cost would obviously be high.
There might also be some need for encapsulating configuration from a data
flow perspective. setupGlobals() in dumpHTML.inc could perhaps be made a bit
more elegant.
Should objects such as $wgUser and $wgConf be members of an application
object? Should the application object be global? Some other heavily-used
globals are $wgLang, $wgContLang, $wgOut and $wgParser. What should we do
with them?
We need to be guided by our applications, and choose the simplest
architecture which supports all of them. Are we interested in:
* Embedding? Need to avoid namespace pollution.
* Per-wiki daemons to do background tasks? Need a means for periodically
refreshing configuration and caches.
* A daemon that responds to requests for multiple wikis? Needs multiple
language objects, and a caching system which discriminates between different
wikis.
I'm interested in daemon (or servlet) applications because of the efficiency
implications.
-- Tim Starling