Regarding your last point, it seems like a mail is sent in this case, to
the following users:
define contactgroup {
contactgroup_name analytics
members dvanliere,ezachte,dtaraborelli,otto,milimetric
}
and the role includes this group in the contacts:
nrpe::monitor_service { 'eventlogging':
ensure => 'present',
description => 'Check status of defined EventLogging jobs',
nrpe_command => '/usr/lib/nagios/plugins/check_eventlogging_jobs',
require => File['/usr/lib/nagios/plugins/check_eventlogging_jobs'],
contact_group => 'admins,analytics',
}
If someone is missing from this list or the check needs to be added to
another service, i'll be glad to do it.
Matanya
On 2014-03-20 13:50, Dan Andreescu wrote:
Thank you for the detailed write-up Ori
We have to fix this. The level of maintenance
that EventLogging gets is not proportional to its usage across the organization.
Analytics, I really need you to step up your involvement.
It was not long ago that EventLogging was running reliably for months at a time. What has
changed is not system load, but the owner seat becoming vacant, leading to a gradual
deterioration of the quality of monitoring and auditing practices.
Indeed, the owner seat is vacant. According to a recent discussion on the analytics list,
we did not yet consider ourselves the proper owners of EventLogging. Our sprint planning
is today and I'll bring it up and note its importance in light of this down time.
Sean proposed moving the EventLogging database to
m2, so that it runs on separate hardware from the research databases. I think he's
right. I filed <https://rt.wikimedia.org/Ticket/Display.html?id=7081 [1]> to request
the migration.
Thank you, I support isolation.
Finally, I think EventLogging Icinga alerts
should have a higher profile, and possibly page someone. Issues can usually be debugged
using the eventloggingctl tool on Vanadium and by inspecting the log files on
vanadium:/var/log/upstart/eventlogging-*.
I think this is the key reason the failure was ignored, so I agree here. We should at the
very least forward these alerts as an email to analytics devs. I have no idea how to do
that, if anyone would like to help that'd be great.
_______________________________________________
Ops mailing list
Ops(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops [2]