[Engineering] The train will resume tomorrow (was Re: All wikis reverted to wmf.8 last night due to T119736)

Ori Livneh ori at wikimedia.org
Tue Jul 12 23:56:11 UTC 2016


On Tue, Jul 12, 2016 at 4:07 PM, Greg Grossmeier <greg at wikimedia.org> wrote:

> <quote name="Greg Grossmeier" date="2016-07-12" time="09:24:38 -0700">
> > https://phabricator.wikimedia.org/T119736 - "Could not find local user
> data for {Username}@{wiki}"
> >
> > There was an order of magnitude increase in the rate of those errors
> > that started on July 7th.
> >
> > Investigation and remediation is on-going.
>
> Investigation and remediation is mostly complete[0] and the vast
> majority of cases have been addressed. There are still users who will
> experience this error for the next ~1 day.[1]
>

Is it actually fixed? It doesn't look like it, from the logs.

Since midnight UTC on July 7, 3,195 distinct users have tried and failed to
log in a combined total of 25,047 times, or an average of approximately
eight times per user. The six days that have passed since then were
business as usual for the Wikimedia Engineering.

Our failure to react to this swiftly and comprehensively is appalling and
embarrassing. It represents failure of process at multiple levels and a
lack of accountability.

I think we need to have a serious discussion about what happened, and think
very hard about the changes we would need to make to our processes and
organizational structure to prevent a recurrence.

I think we should also reach out to the users that were affected and
apologize.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20160712/455bad7a/attachment.html>


More information about the Engineering mailing list