[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Tomasz Finc tfinc at wikimedia.org
Mon Mar 29 21:46:47 UTC 2010


I love lzma compression.

enwiki-20100130-pages-meta-history.xml.bz2 280.3 GB

enwiki-20100130-pages-meta-history.xml.7z 31.9 GB

Download at http://tinyurl.com/yeelbse

Enjoy!

--tomasz

Tomasz Finc wrote:
> Tomasz Finc wrote:
>> New full history en wiki snapshot is hot off the presses!
>>
>> It's currently being checksummed which will take a while for 280GB+ of 
>> compressed data but for those brave souls willing to test please grab it 
>> from
>>
>> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2 
>>
>>
>> and give us feedback about its quality. This run took just over a month 
>> and gained a huge speed up after Tims work on re-compressing ES. If we 
>> see no hiccups with this data snapshot, I'll start mirroring it to other 
>> locations (internet archive, amazon public data sets, etc).
>>
>> For those not familiar, the last successful run that we've seen of this 
>> data goes all the way back to 2008-10-03. That's over 1.5 years of 
>> people waiting to get access to these data bits.
>>
>> I'm excited to say that we seem to have it :)
>>
>> --tomasz
> 
> We now have an md5sum for enwiki-20100130-pages-meta-history.xml.bz2.
> 
> "65677bc275442c7579857cc26b355ded"
> 
> Please verify against it before filing issues.
> 
> --tomasz
> 
> 
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




More information about the Xmldatadumps-admin-l mailing list