[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Anthony wikimail at inbox.org
Thu Apr 8 21:28:27 UTC 2010


I'd like to add that the md5 of the *uncompressed* file is
cd4eee6d3d745ce716db2931c160ee35 .  That's what I got from both the
uncompressed 7z and the uncompressed bz2.  They matched, whew.
Uncompressing and md5ing the bz2 took well over a week.  Uncompressing and
md5ing the 7z took less than a day.

On Mon, Mar 29, 2010 at 8:16 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:

> You can find all the md5sums at
>
> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-md5sums.txt
>
> --tomasz
>
> Anthony wrote:
>
>> Got an md5sum?
>>
>>
>> On Mon, Mar 29, 2010 at 5:46 PM, Tomasz Finc <tfinc at wikimedia.org<mailto:
>> tfinc at wikimedia.org>> wrote:
>>
>>    I love lzma compression.
>>
>>    enwiki-20100130-pages-meta-history.xml.bz2 280.3 GB
>>
>>    enwiki-20100130-pages-meta-history.xml.7z 31.9 GB
>>
>>    Download at http://tinyurl.com/yeelbse
>>
>>    Enjoy!
>>
>>    --tomasz
>>
>>    Tomasz Finc wrote:
>>     > Tomasz Finc wrote:
>>     >> New full history en wiki snapshot is hot off the presses!
>>     >>
>>     >> It's currently being checksummed which will take a while for
>>    280GB+ of
>>     >> compressed data but for those brave souls willing to test please
>>    grab it
>>     >> from
>>     >>
>>     >>
>>
>> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2
>>     >>
>>     >>
>>     >> and give us feedback about its quality. This run took just over
>>    a month
>>     >> and gained a huge speed up after Tims work on re-compressing ES.
>>    If we
>>     >> see no hiccups with this data snapshot, I'll start mirroring it
>>    to other
>>     >> locations (internet archive, amazon public data sets, etc).
>>     >>
>>     >> For those not familiar, the last successful run that we've seen
>>    of this
>>     >> data goes all the way back to 2008-10-03. That's over 1.5 years of
>>     >> people waiting to get access to these data bits.
>>     >>
>>     >> I'm excited to say that we seem to have it :)
>>     >>
>>     >> --tomasz
>>     >
>>     > We now have an md5sum for
>> enwiki-20100130-pages-meta-history.xml.bz2.
>>     >
>>     > "65677bc275442c7579857cc26b355ded"
>>     >
>>     > Please verify against it before filing issues.
>>     >
>>     > --tomasz
>>     >
>>     >
>>     > _______________________________________________
>>     > Wikitech-l mailing list
>>     > Wikitech-l at lists.wikimedia.org
>>    <mailto:Wikitech-l at lists.wikimedia.org>
>>
>>     > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>    _______________________________________________
>>    Xmldatadumps-admin-l mailing list
>>    Xmldatadumps-admin-l at lists.wikimedia.org
>>    <mailto:Xmldatadumps-admin-l at lists.wikimedia.org>
>>
>>    https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100408/99497f37/attachment.htm 


More information about the Xmldatadumps-admin-l mailing list