On Wed, Jul 22, 2009 at 2:37 PM, Tei <oscar.vives(a)gmail.com> wrote:
On Wed, Jul 22, 2009 at 5:48 PM, Chengbin
Zheng<chengbinzheng(a)gmail.com>
wrote:
...
Yes, the "TombRaider" version is exactly the version I want for static
HTML.
Just curious, is
pages-articles.xml.bz2<
http://download.wikimedia.org/enwiki/20090713/enwiki-20090713-pages-article…
like
a "TombRaider" version? If not, what's the difference?
And another curiosity, at
http://en.wikipedia.org/wiki/Wikipedia:TomeRaider_database, it says the
English Wikipedia database is only 3.3GB. Did they use compression? That
seems awfully small. Even if they did, that's an incredible compression
ratio, similar to 7-zip, I don't know how you can do that on a eBook
format.
NTFS compression only brings size down 50%.
At a point, Brion compressed it to 242 MB.
http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg00358.html
You may also read this:
http://en.wikipedia.org/wiki/Solid_compression
--
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I have no doubt that you can compress it to 3.3GB. I'm just curious how
that's possible for an eBook format. 3.3GB, does it include skin, proper
format of Wikipedia, etc?
I'm assuming that the pages-articles.xml.bz2 XML dump includes something
else other than the raw articles? What else are in it?