Tim Starling wrote:
On 03/12/11 08:58, Platonides wrote:
On 02/12/11 22:33, Khalida BEN SIDI AHMED wrote:
I need an html dump of Wikipedia but the link
http://static.wikipedia.org/
does not work. I'd appreciate any explanation or suggestion.
Why do you need an html dump of Wikipedia?
It's a huge task to set up MediaWiki in precisely the same way as it
is on Wikimedia, to import an XML dump and to generate HTML. It takes
a serious amount of hardware and software development resources.
That's why I spent so much time making HTML dump scripts. It's just a
pity that nobody cared enough about it to keep the project going.
This may be a stupid question as I don't understand the mechanics
particularly well, but... as far as I understand it, there's a Squid cache
layer that contains the HTML output of parsed and rendered wikitext pages.
This stored HTML is what most anonymous viewers receive when they access the
site. Why can't that be dumped into a output file rather than running
expensive and timely HTML dump generation scripts?
In other words, it's not as though the HTML doesn't exist already. It's
served millions and millions of times each day. Why is it so painful to make
it available as a dump?
MZMcBride