[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Brian J Mingus Brian.Mingus at Colorado.EDU
Thu Mar 11 05:48:35 UTC 2010


On Wed, Mar 10, 2010 at 10:43 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:

> Brian J Mingus wrote:
>
>>
>> On Wed, Mar 10, 2010 at 8:54 PM, Tomasz Finc <tfinc at wikimedia.org<mailto:
>> tfinc at wikimedia.org>> wrote:
>>
>>    Yup, that's the one. If you have a fast upload pipe then I'm more then
>>    happy to setup space for it. Otherwise it should be arriving in our
>>    snail mail after a couple of days.
>>
>>    -tomasz
>>
>>
>> Anyone may download the file from me here:
>>
>> http://grey.colorado.edu/enwiki-20080103-pages-meta-history.xml.7z
>>
>> The md5sum is:
>>
>> 20a201afc05a4e5f2f6c3b9b7afa225c
>>  enwiki-20080103-pages-meta-history.xml.7z
>>
>> The file size is:
>>
>> 18522193111 (~18 gigabytes)
>>
>> I'm sure you will find my pipe fat enough..;-)
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> Xmldatadumps-admin-l mailing list
>> Xmldatadumps-admin-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>
>
> That seem way too tiny to be the real thing.
>
> --tomasz
>

7zip has a very impressive compression ratio. From download.wikimedia.org:


   - These dumps can be *very* large, uncompressing up to 100 times the
   archive download size. Suitable for archival and statistical use, most
   mirror sites won't want or need this.


That notice has not changed since I downloaded this file.. the uncompressed
size could be well over a terabyte. I'm not sure how long it will take to
unpack but I have just started it. I wonder what drives your intuition?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-admin-l/attachments/20100310/3faa10ac/attachment.htm 


More information about the Xmldatadumps-admin-l mailing list