[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D
Anthony
wikimail at inbox.org
Thu Apr 8 21:28:27 UTC 2010
I'd like to add that the md5 of the *uncompressed* file is
cd4eee6d3d745ce716db2931c160ee35 . That's what I got from both the
uncompressed 7z and the uncompressed bz2. They matched, whew.
Uncompressing and md5ing the bz2 took well over a week. Uncompressing and
md5ing the 7z took less than a day.
On Mon, Mar 29, 2010 at 8:16 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:
> You can find all the md5sums at
>
> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-md5sums.txt
>
> --tomasz
>
> Anthony wrote:
>
>> Got an md5sum?
>>
>>
>> On Mon, Mar 29, 2010 at 5:46 PM, Tomasz Finc <tfinc at wikimedia.org<mailto:
>> tfinc at wikimedia.org>> wrote:
>>
>> I love lzma compression.
>>
>> enwiki-20100130-pages-meta-history.xml.bz2 280.3 GB
>>
>> enwiki-20100130-pages-meta-history.xml.7z 31.9 GB
>>
>> Download at http://tinyurl.com/yeelbse
>>
>> Enjoy!
>>
>> --tomasz
>>
>> Tomasz Finc wrote:
>> > Tomasz Finc wrote:
>> >> New full history en wiki snapshot is hot off the presses!
>> >>
>> >> It's currently being checksummed which will take a while for
>> 280GB+ of
>> >> compressed data but for those brave souls willing to test please
>> grab it
>> >> from
>> >>
>> >>
>>
>> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-meta-history.xml.bz2
>> >>
>> >>
>> >> and give us feedback about its quality. This run took just over
>> a month
>> >> and gained a huge speed up after Tims work on re-compressing ES.
>> If we
>> >> see no hiccups with this data snapshot, I'll start mirroring it
>> to other
>> >> locations (internet archive, amazon public data sets, etc).
>> >>
>> >> For those not familiar, the last successful run that we've seen
>> of this
>> >> data goes all the way back to 2008-10-03. That's over 1.5 years of
>> >> people waiting to get access to these data bits.
>> >>
>> >> I'm excited to say that we seem to have it :)
>> >>
>> >> --tomasz
>> >
>> > We now have an md5sum for
>> enwiki-20100130-pages-meta-history.xml.bz2.
>> >
>> > "65677bc275442c7579857cc26b355ded"
>> >
>> > Please verify against it before filing issues.
>> >
>> > --tomasz
>> >
>> >
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > Wikitech-l at lists.wikimedia.org
>> <mailto:Wikitech-l at lists.wikimedia.org>
>>
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>> _______________________________________________
>> Xmldatadumps-admin-l mailing list
>> Xmldatadumps-admin-l at lists.wikimedia.org
>> <mailto:Xmldatadumps-admin-l at lists.wikimedia.org>
>>
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100408/99497f37/attachment.htm
More information about the Xmldatadumps-admin-l
mailing list