[Foundation-l] Wikimedia mirrors

Aude aude.wiki at gmail.com
Thu Sep 16 10:14:48 UTC 2010


On Sep 16, 2010, at 4:16 AM, George Herbert <george.herbert at gmail.com>  
wrote:

> On Thu, Sep 16, 2010 at 12:58 AM, John Vandenberg <jayvdb at gmail.com>  
> wrote:
>> On Thu, Sep 16, 2010 at 5:40 PM, Federico Leva (Nemo)
>> <nemowiki at gmail.com> wrote:
>>> John Vandenberg, 16/09/2010 03:00:
>>>> English, French, German, Italian, Polish, Portugeuse, Swedish and
>>>> Chinese Wikipedia all appear to have some mirrors, but are any of  
>>>> them
>>>> reliable enough to be used for disaster recovery?
>>>
>>> Obviously not, at least Italian ones.
>>>
>>>> The smaller projects are easier to backup, as they are smaller.   
>>>> I am
>>>> sure that with a little effort and coordination, chapters,
>>>> universities and similar organisations would be willing to  
>>>> routinely
>>>> backup a subset of projects, and combined we would have multiple
>>>> current backups of all projects.
>>>
>>> I agree. Now we have only this:
>>> http://www.balkaninsight.com/en/main/news/21606/
>>
>> Kudos to Milos & Wikimedia Serbia!!
>>
>>> How many TB are needed? I don't know what's the average, but e.g.  
>>> right
>>> now my university should have about 50 TB of free disk space  
>>> (which is
>>> not so much, after all).
>>
>> The key would be to allow the mirrors to delete their mirror when  
>> they
>> need to use their excess storage capability.  If they let us know in
>> advance that they are reclaiming the space, another organisation with
>> excess storage capability can take over.
>
>
> I appreciate all the enthusiasm in thread, but (speaking for myself as
> an individual, and IT consultant who does things like business
> continuity and disaster recovery planning consulting among other
> infrastructure work) this is a core operational competency role that
> the Foundation needs to ensure is handled in house as part of the
> routine IT operations.  And, as I understand it now, it is, though I
> have only had high level discussions with some of the Foundation staff
> about this and not seen the server configs myself so I can't
> personally attest to the status.
>
> Database and file backups need to be in (at least) 2 locations, and my
> understanding is that there are complete redundant copies at the
> Amsterdam datacenter now, and that the new main datacenter in Virginia
> will continue this.
>
> If a third location is needed, the current HQ in San Francisco is
> plenty far enough away from the other 2 locations to provide excellent
> DR capability.  If there's need for a datacenter / fast net access
> redundant copy in SF or the Bay Area, a rack or few U of a shared rack
> would be enough for a fileserver, and that's available at multiple
> excellently connected locations in the Bay Area

Having multiple backups (w/ private user, deleted content data tables)  
within WMF at various data centers is no doubt extremely crucial &  
depending on third parties would be a terrible mistake.

But also up-to-date distributed copies (sans private data, but w/ full  
history & images) outside WMF is also very important.  Why can't we do  
both? I highly highly doubt anything bad will happen to WMF but  
despite best intentions & efforts, you never know (zombies take over?  
rogue sys admin?). Distributed backups beyond WMF help ensure  
wikipedia goes on w/o reliance on WMF

> Disaster Recovery is not something the Foundation should attempt to
> crowdsource.  I recommend it be left to professionals whose job it is
> and who have prior experience in the field.  If you haven't watched
> major services drop, datacenters burn down, software environments melt
> down, and spent years working to ensure that those don't happen again,
> you really don't have a good feel for the type and magnitude of the
> risks and the sorts of tools to employ to try and mitigate them.

Surely there are third parties with such experience and interested in  
this.  Internet Archives? Bibliotecha (sp?) Alexandria? Library of  
Congress?  Surely google has or should have copy?, what about as a  
public dataset on Amazon cloud services (thought there was  
something?), universities are also good some with super data centers  
(e.g. San Diego State University), etc.

> If there's interest in an offline discussion on IT disasters and
> disaster recovery and reliability engineering, I can do that, but it
> should be offline from Foundation-L...

Maybe not foundation-l :) but I am cool with some degree of  
transparency & open discussion on a list or some communications  
channel dedicated to the topic.

I'm not involved in creating dumps but couldn't it be possible to  
offer daily or weekly diffs of enwiki and other wikis, and have  
utilities to apply diffs to the last full dump?  Having regular dumps  
+ regular diffs (weekly, daily, and even minutely) + Swiss army knife  
utilities for handling diffs and dumps is something that openstreetmap  
has managed to excel with and makes me very happy :) to know people  
have up-to-date copies distributed on various places.  I feel sad to  
know this is not the case with wikipedia :(

@aude

> -- 
> -george william herbert
> george.herbert at gmail.com
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l



More information about the foundation-l mailing list