Hello!
It is time for s4. I will be doing it tomorrow on the sanitarium master.
There will be around 3h delay, as the logging table is quite big and takes
around 2-3h to ALTER.
Manuel.
On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui <
marostegui(a)wikimedia.org> wrote:
Hello,
I will be running this schema change on s2 on Monday. Expect delay on
s2 on the replicas.
Manuel.
On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui <
marostegui(a)wikimedia.org> wrote:
> Hey Cloud Team,
>
> I am now running this schema changes on s3, for all the wikis (around
> 900). I have throttled it a bit and it has been running for an hour without
> any significant delay on the new replicas.
> labsdb1003 is delayed a bit, but it normally is lately, so I don't
> think it is related to this change.
> This should take another 15h or so to finish completely.
>
> Cheers
> Manuel.
>
> On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui <
> manuel(a)wikimedia.org> wrote:
>
>>
>>
>> On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis <bd808(a)wikimedia.org>
>> wrote:
>>
>>> On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui <
>>> manuel(a)wikimedia.org> wrote:
>>> > Hello Cloud Admins!
>>> >
>>> > As part of
https://phabricator.wikimedia.org/T174569 we have to
>>> alter some
>>> > big tables.
>>> > One of them is logging, which, for instance, in wikidata takes
>>> around 8h.
>>> > Which is the shard I am currently working on.
>>> >
>>> > Because of the nature of the change (some columns being added) and
>>> ROW based
>>> > replication (what we use in sanitariums) this change needs to be
>>> done with
>>> > replication (from sanitarium, or their masters, to the labs
>>> servers).
>>> >
>>> > This will obviously generate lag and if not done that way, it will
>>> break
>>> > replication till the column is added on the labs hosts, and this
>>> is less
>>> > desirable than replication lag.
>>> >
>>> > I am planning to run the alter probably tomorrow or Monday (I will
>>> notify
>>> > when I start it) for the sanitarium host in s5, that means that
>>> there will
>>> > be lag on the labs servers, for a few hours, on the s5 instance
>>> (which will
>>> > also affect s1 and s3 because we are using the same replication
>>> thread for
>>> > those shards too - which is a FIXME we have pending).
>>> >
>>> > s2, s4, s6 and s7 will remain unaffected as they have their own
>>> replication
>>> > thread.
>>> >
>>> > Should you have any questions, let me know!
>>>
>>> Should we send a message to cloud-announce about this, or just be
>>> ready to tell people that the lag is a known issue due to production
>>> schema changes?
>>>
>>>
>> Don't think it is necessary to send an announcement about it, it is
>> just maintenance. I would suggest you just just to point people to that
>> task so they can know when other shards will be done too :-)
>>
>> Manuel.
>>
>
>