[Labs-l] Dumps project storage

Ryan Lane rlane at wikimedia.org
Wed May 22 16:54:17 UTC 2013


Thanks for the info Hydriz!

On Wednesday, May 22, 2013, Hydriz Scholz wrote:

> Yes, this has been discussed in the past that we will be reducing the
> amount of resources used for the dumps project. I am currently coming up
> with a few scripts and libraries to actually make this process of
> uploading/downloading much less resource intensive and hell more efficient.
> So Ryan, you don't have to worry too much about this one :)
>
> However, the Wikimedia Commons grab is something that was undertaken by a
> team not directly related to Wikimedia. We download from upload.wm.o (yes),
> but at a rather slow speed to avoid overloading the servers. It has stopped
> since quite a while ago in the process of trying to optimize bandwidth and
> resource usage.
>
> I am not exactly sure what Nemo wished to do in the original request, but
> I believe the team is still discussing better ways to handle this (like
> using the mirrors).
>
> So, don't worry about the resource usage, we are currently still testing
> only, so not much usage of precious resources.
>
>
> On Wed, May 22, 2013 at 1:36 PM, Ryan Lane <rlane32 at gmail.com<javascript:_e({}, 'cvml', 'rlane32 at gmail.com');>
> > wrote:
>
>> On Tue, May 21, 2013 at 10:28 PM, Federico Leva (Nemo) <
>> nemowiki at gmail.com <javascript:_e({}, 'cvml', 'nemowiki at gmail.com');>>wrote:
>>
>>> Ryan Lane, 21/05/2013 22:27:
>>>
>>>> It's not that I'm opposed to it, but it's a massive waste of resources
>>>> to download from something in the network to a network fileserver, then
>>>> to upload it to archive.org <http://archive.org>.
>>>>
>>>>
>>>> Why is it necessary to write hundreds of GB to the fileserver before
>>>> they are uploaded?
>>>>
>>>
>>> Sorry, I don't understand the question. Consider the request withdrawn,
>>> thanks for answering.
>>>
>>>
>> I'd like to make sure your need is handled, but I'd like to understand
>> the need too. We've had quite a bit of discussion with Hydriz in the past
>> about this project. It's resource intensive for us, so we try to make sure
>> it's being done efficiently. We made the dumps available at /public/data so
>> that it wouldn't be necessary to download them from download.wm.o, then
>> upload them to archive.org (it's possible to upload them directly from
>> the read-only dumps filesystem).
>>
>> What I'm trying to understand is what is being written to /data/project
>> and why it's larger than 200GB. Based on what I've been told so far, the
>> project uploads dumps to archive.org. This is the first I'm hearing
>> about uploading commons images. Are you downloading large amounts of images
>> from upload.wm.o, writing them to /data/project, uploading them to
>> archive.org, then deleting them from /data/project?
>>
>> - Ryan
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org <javascript:_e({}, 'cvml',
>> 'Labs-l at lists.wikimedia.org');>
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
>
> --
> Regards,
> Hydriz
>
> Be social, follow/add me:
> Facebook: http://tinyurl.com/hydrizfb
> Google+: http://tinyurl.com/hydrizgl
> Twitter: @hydrizwiki
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130522/4c4d3acc/attachment.html>


More information about the Labs-l mailing list