as labsdb1011
is starting to lag again on s4. There were some heavy queries there...let's
see how it goes during the weekend.
Manuel.
On Fri, Oct 2, 2020 at 8:00 AM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
Hello,
Both hosts are back in sync
Manuel.
On Thu, Oct 1, 2020 at 7:19 AM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
> Hello,
>
> Labsdb1011 has recovered, I have repooled it.
> Labsdb1010 is lagging a bit behind, but I am going to repool it with its
> normal weight, and keeping the query killer to 1800 seconds until it fully
> recovers from helping labsdb1011.
>
> Manuel.
>
> On Wed, Sep 30, 2020 at 7:27 AM Manuel Arostegui <
> marostegui(a)wikimedia.org> wrote:
>
>> Hello,
>>
>> This is a heads up about the current situation with s4 (commons) and
>> labsdb.
>>
>> There's been more activity lately on s4, and that had made labsdb1011
>> (analytics role) start lagging behind.
>>
>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>
>> I have tried to ease its weight a couple of days ago, to help it
>> recovering:
>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630392
>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630531
>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630770
>>
>> The last change has (as sort of expected) made labsdb1010 lag:
>>
>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>
>> I am going to decrease the pt-kill query time from 3600 to 1800 to see
>> if that helps labsdb1010 to guard the fort a bit.
>>
>> There's not much else we can do at the moment, but just keep all these
>> issues in mind if people complain about lag on s4 (commons) on the
>> analytics role.
>> The web role is doing fine (labsdb1009 isn't lagging).
>>
>> Manuel.
>>
>