(David A. Wheeler <dwheeler(a)dwheeler.com>)m>):
Lee Daniel Crocker <lee(a)piclab.com> said:
I just did an ad-hoc benchmark on Piclab of the
same installation
with and without link-checking, and on the limited set of pages
used by the test suite, the speedup was only about 3%. Of course
all benchmarks on single-servers may be less applicable to the
multiple-server installation we're going to have soon.
Was this a representative setup (e.g., with the full database)?
If so, that sounds like removing link-checks won't help
much - at least by itself and for single servers.
No, it wasn't representative at all--it was the test suite running
on a newly-installed system. The suite loads about 50 miscellaneous
pages of average length. But I did expect it to be more, so it's
at least a bit of info saying it might not be as big a problem as
we expect.
The swapping overhead does suggest that an excessive
use of memory
by MySQL is resulting in the performance hit.
Perhaps reducing the amount of data
stored in MySQL (by moving the text of cur and old text into
the filesystem and OUT of MySQL entirely).
As another poster noted, the "old" text is
an especially large database, but presumably only a few rare
articles are actually read from "old".
Obviously reading an article from the filesystem will read the
data into memory too, but the underlying OS doesn't try to preload
the entire filesystem into memory, which MySQL appears to be
trying to do. Doing this will mean that archiving the encyclopedia
would have to archive both the MySQL metadata and the
article filesystem, but archiving a filesystem is a rather
well-understood problem :-).
Agreed. Especially with modern, efficient filesystems like Reiser
and UFS, I'm quite willing to believe that they are the most
efficient way to store named chunks of bytes. The software already
does this with images.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC