It seems to me that we should do the Vanadium hardware upgrade at the same
time, if we're going to have down time anyway. Can we bring the new
beefier box online? Last time I talked to Ori I understood such a box was
already set aside for this by ops.
On Mon, Feb 16, 2015 at 11:59 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
For switchover
of writes, we'll need to coordinate an EL consumer
restart to use a new CNAME
of m4-master.eqiad.wmnet
This is configuration change on the EL config plus a small downtime and a
re-start (easy). I am not sure how user /passwords are setup on the config
so cc-ing otto to keep him in the loop.
allow vanadium the relevant network access, and
then presumably do a
little backfilling.
Vanadium network access is something that I imagine ops needs to do as I
doubt we will have permits do do a network change.
When would be a reasonable time within the next
fortnight or so?
I think next week would work once backfiling for the past outages
is over
-if it does work for you-
Thanks,
Nuria
On Sun, Feb 15, 2015 at 8:07 PM, Sean Pringle <springle(a)wikimedia.org>
wrote:
I think we should split up Eventlogging and the
other m2 clients (OTRS
and some minor players). Several reasons:
- Backfilling causes replication lag. Using faster out-of-band
replication for EL is easy because it is all simple bulk-INSERT statements,
but the same does not apply for the other clients. They need different
approaches.
- Master disk space. Even with the data purging discussed at the MW
Summit, I would feel better if EL had more headroom that is does currently,
and zero possibility of unexpected spikes in disk activity and usage
affecting other services.
- EL is the service most sensitive to connection dropouts. Recently Ori
and Nuria have been tweaking SqlAlchemy, but future connection problems
like those seen last week would be easier to debug without having to risk
affecting other services.
I am therefore arranging to promote the current m2 slave db1046 to master
of an m4 cluster tuned for EL, including backfilling. Analytics-store,
s1-analytics-slave, and the new CODFW server will simply switch to
replicate from the new master.
For switchover of writes, we'll need to coordinate an EL consumer restart
to use a new CNAME of m4-master.eqiad.wmnet and allow vanadium the relevant
network access, and then presumably do a little backfilling. When would be
a reasonable time within the next fortnight or so?
Sean
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics