On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis <bd808(a)wikimedia.org>
wrote:
On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui
<manuel(a)wikimedia.org>
wrote:
> Hello Cloud Admins!
>
> As part of
https://phabricator.wikimedia.org/T174569 we have to
alter some
> big tables.
> One of them is logging, which, for instance, in wikidata takes around
8h.
> Which is the shard I am currently working on.
>
> Because of the nature of the change (some columns being added) and
ROW based
> replication (what we use in sanitariums) this change needs to be done
with
> replication (from sanitarium, or their masters, to the labs servers).
>
> This will obviously generate lag and if not done that way, it will
break
> replication till the column is added on the labs hosts, and this is
less
> desirable than replication lag.
>
> I am planning to run the alter probably tomorrow or Monday (I will
notify
> when I start it) for the sanitarium host in s5, that means that there
will
> be lag on the labs servers, for a few hours, on the s5 instance
(which will
> also affect s1 and s3 because we are using the same replication
thread for
> those shards too - which is a FIXME we have pending).
>
> s2, s4, s6 and s7 will remain unaffected as they have their own
replication
> thread.
>
> Should you have any questions, let me know!
Should we send a message to cloud-announce about this, or just be
ready to tell people that the lag is a known issue due to production
schema changes?
Don't think it is necessary to send an announcement about it, it is just
maintenance. I would suggest you just just to point people to that task so
they can know when other shards will be done too :-)
Manuel.