[Labs-l] replication lag

Jaime Crespo jcrespo at wikimedia.org
Tue Nov 24 12:30:22 UTC 2015


The crashes had a positive thing: it forced a restart/upgrade and that
fixed a bug with replication filters that was blocking <
https://phabricator.wikimedia.org/T71463>

If you log in to the replica labs dbs and query the heartbeat_p.heartbeat
table, you may find a surprise now.

Consider this in beta/testing only, with 0 documentation. I will explain
later why this is a huge improvement over the heuristics we had until now.

On Mon, Nov 23, 2015 at 5:56 PM, Magnus Manske <magnusmanske at googlemail.com>
wrote:

> Thanks for the update, and for the work of course!
>
> On Mon, Nov 23, 2015 at 4:54 PM Jaime Crespo <jcrespo at wikimedia.org>
> wrote:
>
>> Labsdb databases are still suffering from corruption. I am trying to
>> repair the tables affected. Updates will be at: <
>> https://phabricator.wikimedia.org/T119315> Expect some downtime in order
>> to fix the issues.
>>
>> On Mon, Nov 23, 2015 at 2:01 PM, Magnus Manske <
>> magnusmanske at googlemail.com> wrote:
>>
>>> Ah, didn't see this, created a bug report:
>>> https://phabricator.wikimedia.org/T119382
>>>
>>> On Mon, Nov 23, 2015 at 12:30 PM Yetkin Sakal <superyetkin at yahoo.com>
>>> wrote:
>>>
>>>> That is good news for us all, jcrespo.
>>>>
>>>> Cheers!
>>>>
>>>>
>>>>
>>>> On Monday, November 23, 2015 2:00 PM, Jaime Crespo <
>>>> jcrespo at wikimedia.org> wrote:
>>>>
>>>>
>>>> It seems that 2 out of 3 labs replica servers had crashed, creating
>>>> corruption in the process. Replica lag is now going down.
>>>>
>>>> Lag measuring (with 1 second resolution) is coming soon, will be
>>>> available as a table, but I hit a bug and it is not being displayed
>>>> correctly, so it is not yet public.
>>>> I will announce it as soon as it is fixed.
>>>>
>>>>
>>>> On Sun, Nov 22, 2015 at 10:06 PM, Yetkin Sakal <superyetkin at yahoo.com>
>>>> wrote:
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>> On Sunday, November 22, 2015 10:52 PM, John <phoenixoverride at gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> I have http://tools.wmflabs.org/betacommand-dev/cgi-bin/replag which
>>>> gives a rough estimate, plus we have
>>>> https://phabricator.wikimedia.org/T119315
>>>>
>>>> On Sun, Nov 22, 2015 at 3:26 PM, Yetkin Sakal <superyetkin at yahoo.com>
>>>> wrote:
>>>>
>>>> Is there anywhere I can keep track of the status pertaining to the
>>>> replication lag on s3? My tools are showing data from nearly a day ago.
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jaime Crespo
>>>> <http://wikimedia.org>
>>>>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>
>>
>>
>> --
>> Jaime Crespo
>> <http://wikimedia.org>
>>
>


-- 
Jaime Crespo
<http://wikimedia.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20151124/f7e8fcbb/attachment.html>


More information about the Labs-l mailing list