[Labs-admin] Labsdb migration (was: Report from ops meeting)

Manuel Arostegui marostegui at wikimedia.org
Wed Feb 1 07:27:27 UTC 2017


On Tue, Jan 31, 2017 at 11:07 PM, Yuvi Panda <yuvipanda at gmail.com> wrote:

> I've drafted https://etherpad.wikimedia.org/p/toolsdb-upgrade. Things
> to get from DBA:
>
> 1. Time - does 5PM UTC (which will be 9AM PST) ok, or do you want it
> to be earlier? We can probably make it earlier if chase or andrew or
> bd808 (who are on an earlier TZ) can be around, or if the DBAs are ok
> with doing this without us being around. Worst case I can wake up
> really early :)
>

5PM UTC works for me!


> 2. Total duration - I've said '6h' as a conservative estimate. Too
> much / too little?
>
>

We'd need to:
- copy the data somewhere (it is 1.6T, so I guess it will take around
1.40-1.45h, let's make it 2h)
- reimage (let's say 15-20 minutes?)
- copy the data back (again, I guess we can make it 2h to be on the safe
side)

I would say 6h is enough, but I have no idea how many unexpected things we
are afraid to find here. As per my chats with Yuvi, many :-)
So 6h sounds reasonable to me!

Just my opinion!
Thanks!!

Manuel.



> Thanks!
>
>
>
> On Tue, Jan 31, 2017 at 12:59 PM, Yuvi Panda <yuvipanda at gmail.com> wrote:
> > Anyone object?
> >
> > On Tue, Jan 31, 2017 at 12:58 PM, Jaime Crespo <jcrespo at wikimedia.org>
> wrote:
> >> Ok, too.
> >>
> >> On Tue, Jan 31, 2017 at 9:52 PM, Yuvi Panda <yuvipanda at gmail.com>
> wrote:
> >>>
> >>> Actually I've just been told that Feb 14 is valentine's day and I
> >>> might be tasked with other duties on that day. Sorry! Feb 15?
> >>>
> >>> On Mon, Jan 30, 2017 at 10:57 AM, Jaime Crespo <jcrespo at wikimedia.org>
> >>> wrote:
> >>> > Ok to me.
> >>> >
> >>> > On Mon, Jan 30, 2017 at 7:54 PM, Yuvi Panda <yuvipanda at gmail.com>
> wrote:
> >>> >>
> >>> >> How about Feb 14? That gives us two weeks.
> >>> >>
> >>> >> On Mon, Jan 30, 2017 at 10:33 AM, Jaime Crespo <
> jcrespo at wikimedia.org>
> >>> >> wrote:
> >>> >> > As an admin, everything you should know about the upcoming
> labsdb1005
> >>> >> > reimage:
> >>> >> >
> >>> >> > * We are ready (DBAs) to do this at any time, we just need to tell
> >>> >> > users
> >>> >> > in
> >>> >> > advance of potential outages/degradations of service
> >>> >> > * For 99% of the users, we will just switchover them
> transparently to
> >>> >> > the
> >>> >> > slave (should not cause issues). As usual, if their application
> does
> >>> >> > not
> >>> >> > retry to reconnect, there will be problems.
> >>> >> > * For 3 users (databases), there will be full outage because they
> >>> >> > have
> >>> >> > such
> >>> >> > a heavy usage that we cannot replicate them in real time. They
> were
> >>> >> > made
> >>> >> > aware of this limitation months ago, so it should not come as a
> >>> >> > surprise:
> >>> >> > https://phabricator.wikimedia.org/T127164 The users's databases
> are
> >>> >> > documented at:
> >>> >> >
> >>> >> >
> >>> >> > https://phabricator.wikimedia.org/diffusion/OPUP/browse/
> production/templates/mariadb/tools.my.cnf.erb;
> f21ce599fe626e7c96010a5d0335370ebe510ca7$65
> >>> >> > * Data will be copied away, server will be reimaged, then data
> will
> >>> >> > be
> >>> >> > copied back That normally takes 3 hours, but things could go
> wrong...
> >>> >> > * People could complain for a 10.0 upgrade (?). But some people
> >>> >> > actually
> >>> >> > complained already for the lack of 5.5 -> 10 upgrade.
> >>> >> > https://phabricator.wikimedia.org/T138517#2796682
> >>> >> > * On switch-back, again bad-programmed application may temporarily
> >>> >> > fail,
> >>> >> > but
> >>> >> > good ones should just switch transparently; unavailable dbs
> should be
> >>> >> > available again
> >>> >> >
> >>> >> > That should be enough background to schedule and send an email to
> >>> >> > users
> >>> >> > :-)
> >>> >> >
> >>> >> > ---------- Forwarded message ----------
> >>> >> > From: Yuvi Panda <yuvipanda at gmail.com>
> >>> >> > Date: Mon, Jan 30, 2017 at 7:14 PM
> >>> >> > Subject: [Labs-admin] Report from ops meeting
> >>> >> > To: Labs admin list for infrastructure and discussion
> >>> >> > <labs-admin at lists.wikimedia.org>
> >>> >> >
> >>> >> >
> >>> >> > 1. Faidon talking about ip space discussions wrt asia dc
> discussion,
> >>> >> > and mentioned we might / should renumber labs IP space. Not sure
> >>> >> > about
> >>> >> > more details.
> >>> >> > 2. Ping on labsdb migration to Jessie
> >>> >> > 3. Mid-year review of annual goals coming up, need status about
> OGE
> >>> >> > migration
> >>> >> >
> >>> >> > That's it.
> >>> >> >
> >>> >> > --
> >>> >> > Yuvi Panda T
> >>> >> > http://yuvi.in/blog
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Labs-admin mailing list
> >>> >> > Labs-admin at lists.wikimedia.org
> >>> >> > https://lists.wikimedia.org/mailman/listinfo/labs-admin
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > --
> >>> >> > Jaime Crespo
> >>> >> > <http://wikimedia.org>
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Labs-admin mailing list
> >>> >> > Labs-admin at lists.wikimedia.org
> >>> >> > https://lists.wikimedia.org/mailman/listinfo/labs-admin
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Yuvi Panda T
> >>> >> http://yuvi.in/blog
> >>> >>
> >>> >> _______________________________________________
> >>> >> Labs-admin mailing list
> >>> >> Labs-admin at lists.wikimedia.org
> >>> >> https://lists.wikimedia.org/mailman/listinfo/labs-admin
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Jaime Crespo
> >>> > <http://wikimedia.org>
> >>>
> >>>
> >>>
> >>> --
> >>> Yuvi Panda T
> >>> http://yuvi.in/blog
> >>
> >>
> >>
> >>
> >> --
> >> Jaime Crespo
> >> <http://wikimedia.org>
> >
> >
> >
> > --
> > Yuvi Panda T
> > http://yuvi.in/blog
>
>
>
> --
> Yuvi Panda T
> http://yuvi.in/blog
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-admin/attachments/20170201/8464a3a3/attachment-0001.html>


More information about the Labs-admin mailing list