[Engineering] Data center switch-over moving ahead next week: please stay available :)

Mark Bergsma mark at wikimedia.org
Thu Apr 21 13:53:23 UTC 2016


Hi everyone,

After we've been successfully serving our sites from our backup data-center
codfw (Dallas) for the past two days, we're now starting our switch back to
eqiad (Ashburn) as planned[1].

We've already moved cache traffic back to eqiad, and within the next
minutes, we'll disable editing by going read-only for approximately 30
minutes - hopefully a bit faster than 2 days ago.

[1] http://blog.wikimedia.org/2016/04/11/wikimedia-failover-test/

On Tue, Apr 19, 2016 at 6:00 PM, Mark Bergsma <mark at wikimedia.org> wrote:

> Hi all,
>
> Today the data center switch-over commenced as planned, and has just fully
> completed successfully. We are now serving our sites from codfw (Dallas,
> Texas) for the next 2 days if all stays well.
>
> We switched the wikis to read-only (editing disabled) at 14:02 UTC, and
> went back read-write at 14:48 UTC - a little longer than planned. While
> edits were possible then, unfortunately at that time Special:Recent Changes
> (and related change feeds) were not yet working due to an unexpected
> configuration problem with our Redis servers until 15:10 UTC, when we found
> and fixed the issue. The site has stayed up and available for readers
> throughout the entire migration.
>
> Overall the procedure was a success with few problems along the way.
> However we've also carefully kept track of any issues and delays we
> encountered for evaluation to improve and speed up the procedure, and
> reducing impact to our users - some of which will already be implemented
> for our switch back on Thursday.
>
> We're still expecting to find (possibly subtle) issues today, and would
> like everyone who notices anything to use the following channels to report
> them:
>
> 1. File a Phabricator issue with project #codfw-rollout
> 2. Report issues on IRC: Freenode channel #wikimedia-tech (if urgent)
> 3. Send an e-mail to the Operations list: ops at lists.wikimedia.org
>
> We're not done yet, but thanks to all who have helped so far. :-)
>
> Mark
>

-- 
Mark Bergsma <mark at wikimedia.org>
Lead Operations Architect
Director of Technical Operations
Wikimedia Foundation
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/engineering/attachments/20160421/7b6ea7bc/attachment.html>


More information about the Engineering mailing list