Brion Vibber wrote:
Brion Vibber wrote:
There's still some improvements to be made,
but I've gone ahead and started a
dump run. If it explodes partway through, oh wells. :)
I've aborted the dumps for the moment; there's something wrong with the text
prefetch system that's supposed to avoid unnecessary (and slow!) load on the
database servers during the dumps, and I'm having trouble debugging it at 4am.
The prefetching was actually working fine; the unexpected slowness was due to
failing to load the ICU library plugin for UTF-8 validation on the XML output.
(The PHP-based code I wrote for this is relatively slow on non-Latin text, and
the dumps are a worst-case scenario for that due to the amount of material
processed.)
Dumps are now continuing in Florida (up to bgwiktionary!) and will restart soon
in Korea.
-- brion vibber (brion @
pobox.com)