[Engineering] [Wikitech-l] Gerrit was down today

Chad Horohoe chorohoe at wikimedia.org
Fri Oct 7 00:28:05 UTC 2016


This is actually how we have Apache configured to respond to Gerrit being
unavailable - that error page is served with a 503 when Gerrit is really
down.

Today I hacked it to always show that page, so even when it was "up" people
wouldn't be hitting it -- we were still debugging and restarting things so
I didn't
want to give false hopes or end up with half-completed transactions.

This can all be improved I think with some Apache config tweaks.

-Chad

On Thu, Oct 6, 2016 at 4:14 PM Amir Ladsgroup <ladsgroup at gmail.com> wrote:

> It was bothering to me but I'm guessing this is one of so so many flaws of
> gerrit itself and probably not fixable easily (other people are more
> qualified to comment) but i want to suggest speeding up the process to move
> to differential which is much better in handling such down times alongside
> with other benefits.
>
> Best
>
> On Fri, Oct 7, 2016, 2:26 AM Gergo Tisza <gtisza at wikimedia.org> wrote:
>
> Thanks a lot for the quick recovery!
>
> Would it be possible to use something other than a redirect next time when
> traffic needs to be blocked? An apache deny rule or a 404 would work, but a
> redirect means that reloading the page (or reopening the browser) will
> cause the URL to be lost with little hope of recovery (browsers don't
> record redirects in the history). That can be very annoying when one uses
> tabs as bookmarks (bad habit as it is).
>
> On Thu, Oct 6, 2016 at 3:33 PM, Chad Horohoe <chorohoe at wikimedia.org>
> wrote:
>
> > Hi!
> >
> > Sorry for the extended downtime! From what we can tell, it appears as
> > though
> > the machine that Gerrit is running on (lead) is having some hardware
> > issues that
> > are making the CPU misbehave. We've worked around it for now, so things
> > should
> > be up (and Zuul is processing CI events just fine).
> >
> > However, since it appears it's a hardware problem, we're planning to
> > migrate off
> > of lead to a new machine (cobalt). The public IP addresses will not be
> > changing.
> > The plan right now is to do this migration tomorrow with a scheduled
> > downtime
> > at 17:00UTC (10:00 PST).
> >
> > We'll be keeping a close eye on things in the meantime, so if things
> > deteriorate
> > again we can start the migration sooner.
> >
> > (and yeah, wikitech incident report to follow, I'm a little burnt out
> > right now though)
> >
> > Thanks again for bearing with us!
> >
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20161007/56a5febb/attachment.html>


More information about the Engineering mailing list