erik_moeller(a)gmx.de (Erik Moeller) declared:
Hi David,
nice to see you here -- I enjoyed reading your Linux/OSS-related
papers.
Thanks!!
I have to say that disabling link checking on the live
Wikipedia,
even for a short time, is hardly acceptable.
Well, at least a few measurements to determine if it _IS_
a bottleneck would help. If the Wikipedia can be made to run
efficiently with link checking, then link checking should be left in.
I'm not against link checking, per se.
It's just that link checking is less important than
having a running Wikipedia. Faster hardware will
probably help, but software solutions are still worth pursuing.
A hardware speedup of 2x won't be enough if usage increases 20x.
Other alternatives include storing the "current" text - or even
ALL text - in the filesystem, using special filenames so that
MySQL doesn't need to be consulted (or only consulted a little)
for certain common queries. It could store
either the original wiki text, or the HTML'ized versions, or both.
MySQL is actually a pretty good SQL database. However,
it's optimized for serving structured data. The underlying OS
has received FAR more optimization work to find and retrieve
unstructured data, and it also has more information available to it
so it can swap/drop information more efficiently.
As was noted elsewhere, if old text is stored in the filesystem,
it would reduce the memory usage of MySQL substantially.
If all text were stored in the filesystem, MySQL would then
primarily be used for metadata storage and search index uses
(if search is on).
Anyway, I'm just typing ideas, hoping that some are useful.
My real goal is that I don't want the Wikipedia to be
a victim of its own success :-). A solution that works is,
by definition, a good solution :-).