[Engineering] Gerrit was down today

Gergo Tisza gtisza at wikimedia.org
Thu Oct 6 22:56:38 UTC 2016


Thanks a lot for the quick recovery!

Would it be possible to use something other than a redirect next time when
traffic needs to be blocked? An apache deny rule or a 404 would work, but a
redirect means that reloading the page (or reopening the browser) will
cause the URL to be lost with little hope of recovery (browsers don't
record redirects in the history). That can be very annoying when one uses
tabs as bookmarks (bad habit as it is).

On Thu, Oct 6, 2016 at 3:33 PM, Chad Horohoe <chorohoe at wikimedia.org> wrote:

> Hi!
>
> Sorry for the extended downtime! From what we can tell, it appears as
> though
> the machine that Gerrit is running on (lead) is having some hardware
> issues that
> are making the CPU misbehave. We've worked around it for now, so things
> should
> be up (and Zuul is processing CI events just fine).
>
> However, since it appears it's a hardware problem, we're planning to
> migrate off
> of lead to a new machine (cobalt). The public IP addresses will not be
> changing.
> The plan right now is to do this migration tomorrow with a scheduled
> downtime
> at 17:00UTC (10:00 PST).
>
> We'll be keeping a close eye on things in the meantime, so if things
> deteriorate
> again we can start the migration sooner.
>
> (and yeah, wikitech incident report to follow, I'm a little burnt out
> right now though)
>
> Thanks again for bearing with us!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20161006/ee271724/attachment.html>


More information about the Engineering mailing list