[Xmldatadumps-admin-l] [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

Kevin Webb kpwebb at gmail.com
Thu Mar 11 04:03:00 UTC 2010


Also, does the 20080103 dump combined with lastest 20100130 dump
provide a complete edit history of Wikipedia? I'm unclear about
whether the 20080103 dump was cumulative or if there was some other
previous cut off point.

Is it correct to assume that future dumps will begin post 2010-01-30?

Thanks!
kpw

On Wed, Mar 10, 2010 at 10:55 PM, Kevin Webb <kpwebb at gmail.com> wrote:
> It's in EC2 so I could get it to you in about 20 mins. Just hit me
> with an email off-list with the desired destination...
>
> kpw
>
> On Wed, Mar 10, 2010 at 10:54 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:
>> Yup, that's the one. If you have a fast upload pipe then I'm more then happy
>> to setup space for it. Otherwise it should be arriving in our snail mail
>> after a couple of days.
>>
>> -tomasz
>>
>> Kevin Webb wrote:
>>>
>>> Many thanks to everyone involved.
>>>
>>> Also, in case it's of use to anyone I have a copy of the
>>> enwiki-20080103-pages-meta-history.xml dump in 7z form. Is that the
>>> backup that's beeing referred to or is it in fact 20081003?
>>>
>>> kpw
>>>
>>> On Wed, Mar 10, 2010 at 10:20 PM, Tomasz Finc <tfinc at wikimedia.org> wrote:
>>>>
>>>> Thankfully due to an awesome volunteer we'll be able to get that 2008
>>>> snapshot in our archive. I'll mail out when it shows up in our snail
>>>> mail.
>>>>
>>>> --tomasz
>>>>
>>>> Erik Zachte wrote:
>>>>>
>>>>> I'm thrilled. Big thanks to Tim and Tomasz for pulling this off.
>>>>> For the record the 2008-10-03 dump existed for a short while only.
>>>>> It evaporated before wikistats and many others could parse it,
>>>>> so now we can finally catch up from 3.5 (!) years backlog.
>>>>>
>>>>> Erik Zachte
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: wikitech-l-bounces at lists.wikimedia.org [mailto:wikitech-l-
>>>>>> bounces at lists.wikimedia.org] On Behalf Of Tomasz Finc
>>>>>> Sent: Thursday, March 11, 2010 4:11
>>>>>> To: Wikimedia developers; xmldatadumps-admin-l at lists.wikimedia.org;
>>>>>> xmldatadumps at lists.wikimedia.org
>>>>>> Subject: [Wikitech-l] 2010-03-11 01:10:08: enwiki Checksumming pages-
>>>>>> meta-history.xml.bz2 :D
>>>>>>
>>>>>> New full history en wiki snapshot is hot off the presses!
>>>>>>
>>>>>> It's currently being checksummed which will take a while for 280GB+ of
>>>>>> compressed data but for those brave souls willing to test please grab
>>>>>> it
>>>>>> from
>>>>>>
>>>>>> http://download.wikipedia.org/enwiki/20100130/enwiki-20100130-pages-
>>>>>> meta-history.xml.bz2
>>>>>>
>>>>>> and give us feedback about its quality. This run took just over a month
>>>>>> and gained a huge speed up after Tims work on re-compressing ES. If we
>>>>>> see no hiccups with this data snapshot, I'll start mirroring it to
>>>>>> other
>>>>>> locations (internet archive, amazon public data sets, etc).
>>>>>>
>>>>>> For those not familiar, the last successful run that we've seen of this
>>>>>> data goes all the way back to 2008-10-03. That's over 1.5 years of
>>>>>> people waiting to get access to these data bits.
>>>>>>
>>>>>> I'm excited to say that we seem to have it :)
>>>>>>
>>>>>> --tomasz
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> Wikitech-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xmldatadumps-admin-l mailing list
>>>>> Xmldatadumps-admin-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>>>
>>>> _______________________________________________
>>>> Xmldatadumps-admin-l mailing list
>>>> Xmldatadumps-admin-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l
>>>>
>>
>>
>



More information about the Xmldatadumps-admin-l mailing list