Re: [Cloud-admin] Upcoming: Delay in new wiki replicas on s5

12 Dec 2017

Hello!

It is time for s4. I will be doing it tomorrow on the sanitarium master.
There will be around 3h delay, as the logging table is quite big and takes
around 2-3h to ALTER.

Manuel.

On Thu, Dec 7, 2017 at 8:38 AM, Manuel Arostegui &lt;marostegui(a)wikimedia.org&gt;
wrote:

...
  Hello,

 I will be running this schema change on s2 on Monday. Expect delay on s2
 on the replicas.

 Manuel.

 On Wed, Nov 29, 2017 at 1:53 PM, Manuel Arostegui <
 marostegui(a)wikimedia.org&gt; wrote:

  Hey Cloud Team,

 I am now running this schema changes on s3, for all the wikis (around
 900). I have throttled it a bit and it has been running for an hour without
 any significant delay on the new replicas.
 labsdb1003 is delayed a bit, but it normally is lately, so I don't think
 it is related to this change.
 This should take another 15h or so to finish completely.

 Cheers
 Manuel.

 On Wed, Nov 15, 2017 at 6:45 PM, Manuel Arostegui &lt;manuel(a)wikimedia.org&gt;
 wrote:

 On Wed, Nov 15, 2017 at 6:39 PM, Bryan Davis &lt;bd808(a)wikimedia.org&gt;
 wrote:

  On Wed, Nov 15, 2017 at 9:48 AM, Manuel Arostegui
&lt;manuel(a)wikimedia.org&gt;
 wrote:
 > Hello Cloud Admins!
 >
 > As part of https://phabricator.wikimedia.org/T174569 we have to
 alter some
 > big tables.
 > One of them is logging, which, for instance, in wikidata takes around
 8h.
 > Which is the shard I am currently working on.
 >
 > Because of the nature of the change (some columns being added) and
 ROW based
 > replication (what we use in sanitariums) this change needs to be done
 with
 > replication (from sanitarium, or their masters, to the labs servers).
 >
 > This will obviously generate lag and if not done that way, it will
 break
 > replication till the column is added on the labs hosts, and this is
 less
 > desirable than replication lag.
 >
 > I am planning to run the alter probably tomorrow or Monday (I will
 notify
 > when I start it) for the sanitarium host in s5, that means that there
 will
 > be lag on the labs servers, for a few hours, on the s5 instance
 (which will
 > also affect s1 and s3 because we are using the same replication
 thread for
 > those shards too - which is a FIXME we have pending).
 >
 > s2, s4, s6 and s7 will remain unaffected as they have their own
 replication
 > thread.
 >
 > Should you have any questions, let me know!

 Should we send a message to cloud-announce about this, or just be
 ready to tell people that the lag is a known issue due to production
 schema changes?

  Don't think it is necessary to send an announcement about it, it is just
 maintenance. I would suggest you just just to point people to that task so
 they can know when other shards will be done too :-)

 Manuel.

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud-admin] Upcoming: Delay in new wiki replicas on s5