Hey Cloud Team,
I am now running this schema changes on s3, for all the wikis (around
900). I have throttled it a bit and it has been running for an hour without
any significant delay on the new replicas.
labsdb1003 is delayed a bit, but it normally is lately, so I don't think
it is related to this change.
This should take another 15h or so to finish completely.
Cheers
Manuel.
On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui <manuel(a)wikimedia.org>
wrote:
On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis <bd808(a)wikimedia.org>
wrote:
> On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui <
> manuel(a)wikimedia.org> wrote:
> > Hello Cloud Admins!
> >
> > As part of
https://phabricator.wikimedia.org/T174569 we have to
> alter some
> > big tables.
> > One of them is logging, which, for instance, in wikidata takes
> around 8h.
> > Which is the shard I am currently working on.
> >
> > Because of the nature of the change (some columns being added) and
> ROW based
> > replication (what we use in sanitariums) this change needs to be
> done with
> > replication (from sanitarium, or their masters, to the labs servers).
> >
> > This will obviously generate lag and if not done that way, it will
> break
> > replication till the column is added on the labs hosts, and this is
> less
> > desirable than replication lag.
> >
> > I am planning to run the alter probably tomorrow or Monday (I will
> notify
> > when I start it) for the sanitarium host in s5, that means that
> there will
> > be lag on the labs servers, for a few hours, on the s5 instance
> (which will
> > also affect s1 and s3 because we are using the same replication
> thread for
> > those shards too - which is a FIXME we have pending).
> >
> > s2, s4, s6 and s7 will remain unaffected as they have their own
> replication
> > thread.
> >
> > Should you have any questions, let me know!
>
> Should we send a message to cloud-announce about this, or just be
> ready to tell people that the lag is a known issue due to production
> schema changes?
>
>
Don't think it is necessary to send an announcement about it, it is
just maintenance. I would suggest you just just to point people to that
task so they can know when other shards will be done too :-)
Manuel.