[Labs-l] Production status of labs (wsa Re: Reboot of virt11 Friday Sept 6 at 20:00 UTC)

Maarten Dammers maarten at mdammers.nl
Sat Sep 7 22:10:53 UTC 2013


Hi Ryan,

Op 6-9-2013 0:57, Ryan Lane schreef:
> On Fri, Sep 6, 2013 at 6:27 AM, Platonides <platonides at gmail.com 
> <mailto:platonides at gmail.com>> wrote:
>
>     On 06/09/13 00:01, Ryan Lane wrote:
>     > Outside of tools (and deployment-prep, which is rather ephemeral) we
>     > don't consider any project "semi-production" and the failure
>     model is
>     > meant to be handled at the instance level. (...)
>
>     Well, I think it's on tools:
>     http://tools.wmflabs.org/heritage/api/...
>
>
>
>         That said, relying on labs for something like this is legitimately
>         insane. Have you talked with Wikimedia Foundation about getting
>         production level support for WLM? That's what you actually need.
>
>         What will you do if the node hosting your instance completely
>         dies? Is
>         your work puppetized? Can you just bring up a new instance to
>         replace
>         it? Are you doing backups?
>
>
>     I think it's just a clone of the project at the toolserver, and
>     the code is under version control. It would be nice to have it
>     puppetized, though.
>
>     IMHO a hostname like api.wikilovesmonuments.org
>     <http://api.wikilovesmonuments.org> should have been used, for
>     independence from toolserver, tools, wmflabs instances...
>
>
> Using a different hostname doesn't really do much for independence 
> from anything unless you're also going to host the infrastructure as well.
>
> My point still stands whether this is on tools or not. If something is 
> important enough that it shouldn't have downtime it shouldn't be on 
> Labs, even in the tools project. It should have production-level support.
>
> Labs is not funded, staffed, or architected to handle production-level 
> services. Tools was created in a way that will work around host-level 
> failures in the Labs infrastructure, but if the network node dies, 
> tools will still go down. There's a number of other SPOFs in the 
> architecture that we're willing to accept for a semi-production 
> environment that would not exist in a fully production environment.
>
> We are putting effort into eliminating the SPOFs where feasible, but 
> we'll never recommend Labs for services that must be up, since that's 
> what production is for...
You are aware that the Toolserver is being killed of and toollabs is 
supposed to replace it? Toolserver has alway been production, just not 
with so many 9's of availability. Are you aware of how important the 
tool* projects are for our projects? Tool* is not a stupid sandbox, no 
our projects depend on tool* being available regardless of the 
production status you think it has. Our communities don't have an 
alternative.

So question to the WMF: Are you going to treat toollabs seriously like a 
production environment or not?
I understand it won't have the same availability requirement as our main 
cluster but you can at least apply proper production practices like 
having a maintenance window, change management etc.

Maarten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20130908/acc7204b/attachment.html>


More information about the Labs-l mailing list