Hi all,
I just caused another small webrequest log data loss. I merged a change
that was supposed to have no effect, but unfortunately it did. Between
21:54 and 22:15 UTC today. A puppet change was merged in which an important
firewall rule dealing with IPSec was lost. This kept all varnishkafkas in
remote datacenters from producing to Kafka during this time.
I have documented this here:
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_k…
Apologies to all!
-Andrew Otto
---------- Forwarded message ----------
From: Marcel Ruiz Forns <mforns(a)wikimedia.org>
Date: Wed, Dec 16, 2015 at 10:29 AM
Subject: [Analytics] [Outage] Small data loss in raw_webrequest on
2015-12-15
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org>
Hi Analytics,
Yesterday, Dec 15, during the course of 1 hour (17h to 18h UTC) there was
an irrecoverable raw_webrequest data loss of ~30%: 25.6% (misc), 19.5%
(mobile), 19.1% (text), 39.1% (upload). This represents around 1% of the
data for that day.
The loss was due to the enabling of IPSec, which encrypts varniskafka
traffic between caches in remote datacenters and the Kafka brokers in
eqiad. During a period of about 40ish minutes, no webrequest logs from
remote datacenters were successfully produced to Kafka.
Here's the outage note:
https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest#Changes_and_k…
Sorry for the inconvenience.
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics