Re: [Wikitech-l] 📈 Wikimedia production errors help

15 Sep 2020

On 9/15/20 9:43 AM, Alex Ezell wrote:

...
  Do we use levels for any of these error log outputs?
That is, are they
 classified on output as High, Medium, Low, Info, or something like that? 
To an extent, yes.  We have separate channels for PHP errors and 
exceptions, for example, and although I don't think we currently 
differentiate in logstash, maybe we could plausibly draw a further 
distinction between PHP error levels.  Intuitively, a low number of PHP 
notices probably indicates something of lower severity than a high 
number of fatals, and so forth.

Teasing out more detail about reported error severity could be a useful 
exercise, but I'm not sure it would result in much more meaningful 
signals than we currently have about production health.  Serious 
problems can manifest as trivial-seeming notices, some issues start out 
that way and cascade over time, and generally any form of recurring 
logspam needs human evaluation before we can easily say much more than 
"this is a problem".

...
  Or do we have to triage each of them as we examine
them? 
Yeah.  There are doubtless a lot of ways to improve the tooling we use 
for that process, but right now I think it would be most helpful if we 
just had more eyes _routinely_ on the logs and the workboard.  (See 
Tyler's earlier and much more detailed/thoughtful response to this thread.)

-- 
Brennen Bearnes
Release Engineering

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] 📈 Wikimedia production errors help