[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Tomasz Finc tfinc at wikimedia.org
Thu Mar 11 04:24:17 UTC 2010


Newer snapshots super cede their old brethren. So a 20100130 already 
includes all of the old content of 20080103 baring and format changes.

--tomasz

Kevin Webb wrote:
> Also, does the 20080103 dump combined with lastest 20100130 dump
> provide a complete edit history of Wikipedia? I'm unclear about
> whether the 20080103 dump was cumulative or if there was some other
> previous cut off point.
> 
> Is it correct to assume that future dumps will begin post 2010-01-30?
> 
> Thanks!
> kpw
> 
> On Wed, Mar 10, 2010 at 10:55 PM, Kevin Webb <kpwebb at gmail.com> wrote:
>> It's in EC2 so I could get it to you in about 20 mins. Just hit me
>> with an email off-list with the desired destination...
>>
>> kpw
>>
>> On Wed, Mar 10, 2010 at 10:54 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:
>>> Yup, that's the one. If you have a fast upload pipe then I'm more then happy
>>> to setup space for it. Otherwise it should be arriving in our snail mail
>>> after a couple of days.
>>>
>>> -tomasz
>>>
>>> Kevin Webb wrote:
>>>> Many thanks to everyone involved.
>>>>
>>>> Also, in case it's of use to anyone I have a copy of the
>>>> enwiki-20080103-pages-meta-history.xml dump in 7z form. Is that the
>>>> backup that's beeing referred to or is it in fact 20081003?
>>>>
>>>> kpw
>>>>
>>>> On Wed, Mar 10, 2010 at 10:20 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:
>>>>> Thankfully due to an awesome volunteer we'll be able to get that 2008
>>>>> snapshot in our archive. I'll mail out when it shows up in our snail
>>>>> mail.
>>>>>
>>>>> --tomasz
>>>>>
>>>>> Erik Zachte wrote:
>>>>>> I'm thrilled. Big thanks to Tim and Tomasz for pulling this off.
>>>>>> For the record the 2008-10-03 dump existed for a short while only.
>>>>>> It evaporated before wikistats and many others could parse it,
>>>>>> so now we can finally catch up from 3.5 (!) years backlog.
>>>>>>
>>>>>> Erik Zachte
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: wikitech-l-bounces at lists.wikimedia.org [mailto:wikitech-l-
>>>>>>> bounces at lists.wikimedia.org] On Behalf Of Tomasz Finc
>>>>>>> Sent: Thursday, March 11, 2010 4:11
>>>>>>> To: Wikimedia developers; xmldatadumps-admin-l at lists.wikimedia.org;
>>>>>>> xmldatadumps at lists.wikimedia.org
>>>>>>> Subject: [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-
>>>>>>> meta-history.xml.bz2 :D
>>>>>>>
>>>>>>> New full history en wiki snapshot is hot off the presses!
>>>>>>>
>>>>>>> It's currently being checksummed which will take a while for 280GB+ of
>>>>>>> compressed data but for those brave souls willing to test please grab
>>>>>>> it
>>>>>>> from
>>>>>>>
>>>>>>> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-
>>>>>>> meta-history.xml.bz2
>>>>>>>
>>>>>>> and give us feedback about its quality. This run took just over a month
>>>>>>> and gained a huge speed up after Tims work on re-compressing ES. If we
>>>>>>> see no hiccups with this data snapshot, I'll start mirroring it to
>>>>>>> other
>>>>>>> locations (internet archive, amazon public data sets, etc).
>>>>>>>
>>>>>>> For those not familiar, the last successful run that we've seen of this
>>>>>>> data goes all the way back to 2008-10-03. That's over 1.5 years of
>>>>>>> people waiting to get access to these data bits.
>>>>>>>
>>>>>>> I'm excited to say that we seem to have it :)
>>>>>>>
>>>>>>> --tomasz
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikitech-l mailing list
>>>>>>> Wikitech-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xmldatadumps-admin-l mailing list
>>>>>> Xmldatadumps-admin-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>>>> _______________________________________________
>>>>> Xmldatadumps-admin-l mailing list
>>>>> Xmldatadumps-admin-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>>>>
>>>




More information about the Xmldatadumps-admin-l mailing list