On 21/07/2009, at 6:48 PM, Daniel Schwen wrote:
wouldn't it be faster than to actually create a
static HTML dump the
traditional way?
The content is wiki-text. It has to be parsed to be turned into
HTML. There
isn't a more traditional way, because there is no other way.
Wouldn't it be possible to dump the parser cache instead of dumping
XML and reparsing? Al the parsing work is already done on the
Wikimedia servers, why do it again on a slow desktop system?
For a few reasons:
1/ There's no reason to expect that the contents of every page,
revision, et cetera, would be in the parser cache.
2/ Deleted or otherwise private revision content may remain in the
parser cache.
3/ There would be a lot of redundant content in the parser cache,
owing to people browsing with the same options.
4/ None of the useful article metadata is stored in the parser cache.
5/ The parser cache is stored in memcached, a hash-based system which
it is impossible to simply "dump", let alone dump selectively
excluding all of the other things stored in memcached (including quite
a bit of private data).
It might, however, be sensible to generate parsed HTML text for every
page, save them in a directory, and then zip it up.
Oh, wait...
I always thought it would be much more useful to generate the HTML of
action=render for every page rather than the action=view with the HTML
for one specific skin a million or so times, which is then a pain to
parse out if you want to do anything other than open the HTML in a
browser.
(-:
Andrew Dunbar (hippietrail)
--
Andrew Garrett
agarrett(a)wikimedia.org
http://werdn.us/
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l