[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Erik Zachte erikzachte at infodisiac.com
Thu Mar 11 03:18:45 UTC 2010


I'm thrilled. Big thanks to Tim and Tomasz for pulling this off.
For the record the 2008-10-03 dump existed for a short while only.
It evaporated before wikistats and many others could parse it, 
so now we can finally catch up from 3.5 (!) years backlog.

Erik Zachte

> -----Original Message-----
> From: wikitech-l-bounces at lists.wikimedia.org [mailto:wikitech-l-
> bounces at lists.wikimedia.org] On Behalf Of Tomasz Finc
> Sent: Thursday, March 11, 2010 4:11
> To: Wikimedia developers; xmldatadumps-admin-l at lists.wikimedia.org;
> xmldatadumps at lists.wikimedia.org
> Subject: [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-
> meta-history.xml.bz2 :D
> 
> New full history en wiki snapshot is hot off the presses!
> 
> It's currently being checksummed which will take a while for 280GB+ of
> compressed data but for those brave souls willing to test please grab
> it
> from
> 
> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-
> meta-history.xml.bz2
> 
> and give us feedback about its quality. This run took just over a month
> and gained a huge speed up after Tims work on re-compressing ES. If we
> see no hiccups with this data snapshot, I'll start mirroring it to
> other
> locations (internet archive, amazon public data sets, etc).
> 
> For those not familiar, the last successful run that we've seen of this
> data goes all the way back to 2008-10-03. That's over 1.5 years of
> people waiting to get access to these data bits.
> 
> I'm excited to say that we seem to have it :)
> 
> --tomasz
> 
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l





More information about the Xmldatadumps-admin-l mailing list