Hi all,
Christian -- thanks for following up on this.
I've created a ticket[1] for this issue as a production issue. Kevin --
please triage tomorrow in standup. We can own the actual incident report
but we'll need to get some help from Ori in understanding how to perform
the post mortem.
The current status for EventLogging support is that Ori, the Analytics
team, the Operations team and the Platform teams are discussing the
handover of EventLogging. The Analytics team will own EventLogging as soon
as we can, but we need to get consensus on the details.
I've written up our discussions on this wiki page[2]. Please feel free to
add/discuss. We've had some preliminary discussions with Andrew Otto but
need to follow up with Rob and Ori.
-Toby
[1]
On Thu, Apr 3, 2014 at 6:27 AM, Christian Aistleitner <
christian(a)quelltextlich.at> wrote:
Hi Toby,
and zooooooooom ... there goes another week without us even deciding
whether or not we feel responsible doing the incident documentation
and follow-up work. :-D
I feel somewhat embarrassed that after two weeks, and after the ping
on mailing lists, we still did not yet manage to tell Greg at least
whether or not we'll work on it.
So,--if you do not chime in/push back by then--I'll be bold and I'll
consider our given lip service around EventLogging a commitment and
start working on it on Monday (2014-04-07).
Best regards,
Christian
On Thu, Mar 27, 2014 at 06:58:27PM +0100, Christian Aistleitner wrote:
Hi Analytics Dev team,
On Thu, Mar 20, 2014 at 01:20:54PM -0700, Greg Grossmeier wrote:
> <quote name="Ori Livneh" date="2014-03-20"
time="03:52:01 -0700">
> > [ At about 2014-03-18 00:04 UTC, db1047 stopped accepting incoming
> > connections. At some point during the subsequent hour, MariaDB had
either
> > crashed or been manually restarted.
Sean noticed that the database
was
> > choking on some queries from the
researchers and notified the
wmfresearch
list.
Can someone from Analytics own this post-mortem and put it on the wiki:
https://wikitech.wikimedia.org/wiki/Incident_documentation
Please add specific next steps (with bug#, RT#s, or gerrit urls), even
(especially) things you haven't done yet and are just "nice to have".
it's been a week, and I cannot find the post-mortem Greg requested at
the above URL :-/
Neither did I see a response from our team to Greg's email.
I lost track of our EventLogging responsibilities during the recent
back and forth. So:
Toby, are we actually grabbing Greg's item or are we pushing back on
it?
Best regards,
Christian
P.S.: Toby, if we're grabbing it: I totally lack knowledge about both
EventLogging, and the incident itself. So, be prepared for double slow
start if I get to work on it.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage:
http://quelltextlich.at/
---------------------------------------------------------------