[Labs-l] Labs reliability

Jeremy Baron jeremy at tuxmachine.com
Sun Aug 31 05:25:11 UTC 2014


On Sun, Aug 31, 2014 at 5:15 AM, Pine W <wiki.pine at gmail.com> wrote:
> I have heard that Labs is an experimental envirionment, and service outages
> and storage erasures are to be expected from time to time. What is the
> recommended alternative to Labs for services that need good reliability like
> bots?

Tell us more about the use case you have in mind?

is 48 hours of downtime per year too much? 24 hours?

There is some redundancy built in to both labs and tool labs in
particular. I would say typical services (including bots) should be
configured either so that there's multiple copies running (on
different hosts) so that it's not broken by the failure of a single
host *or* configured so that it's restarted automatically if it dies
or is not currently running.

If you've encountered a problem the first step is to reconfigure your
service to do one of those things.

Documentation here is somewhat lacking. e.g.
https://wikitech.wikimedia.org/ doesn't mention "bigbrother" at all.

-Jeremy



More information about the Labs-l mailing list