Re: [Wikitech-l] Simple way to convert XML to HTML

26 Jul 2009

On 21/07/2009, at 6:48 PM, Daniel Schwen wrote:

...
    wouldn't it be faster than to actually create a
static HTML dump the
 traditional way?  The content is wiki-text. It has to be parsed to be turned into 

 HTML. There
 isn't a more traditional way, because there is no other way. 
 Wouldn't it be possible to dump the parser cache instead of dumping
 XML and reparsing? Al the parsing work is already done on the
 Wikimedia servers, why do it again on a slow desktop system? 
For a few reasons:

1/ There's no reason to expect that the contents of every page,  
revision, et cetera, would be in the parser cache.
2/ Deleted or otherwise private revision content may remain in the  
parser cache.
3/ There would be a lot of redundant content in the parser cache,  
owing to people browsing with the same options.
4/ None of the useful article metadata is stored in the parser cache.
5/ The parser cache is stored in memcached, a hash-based system which  
it is impossible to simply "dump", let alone dump selectively  
excluding all of the other things stored in memcached (including quite  
a bit of private data).

It might, however, be sensible to generate parsed HTML text for every  
page, save them in a directory, and then zip it up.

Oh, wait...

--
Andrew Garrett
agarrett(a)wikimedia.org
http://werdn.us/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Simple way to convert XML to HTML