- Cloud-announce - lists.wikimedia.org

wikidatawiki moving from "s5" slice to new "s8" slice on 2018-01-09

by Bryan Davis

On 2018-01-09 the wikidatawiki database will move from its current home on the "s5" slice to a brand new "s8" slice. The wikidatawiki.{analytics,web}.db.svc.eqiad.wmflabs and wikidatawiki.labsdb DNS service names will be updated to point to the new slice host by system administrators. This change should not affect most users of the Wiki Replica servers. Only applications that are connecting to s5.{analytics,web}.db.svc.eqiad.wmflabs or s5.labsdb and expecting the wikidatawiki_p database to be present will be affected. These applications should update their configuration to connect to the new "s8" slice instead. This is the end point (or nearly so) of a large amount of work that has been done by Wikimedia's fabulous DBA team of Jamie and Manuel to improve the health of the Wikidata wiki. See <https://phabricator.wikimedia.org/T177208> for more details. Bryan -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 4 months

1
0
0 0

*.labsdb service names now point to "new" Wiki Replica cluster

by Bryan Davis

The labsdb1003.eqiad.wmnet (aka c3.labsdb) server is no longer serving *.labsdb requests. The c3.labsdb service name will continue to point to the labsdb1003.eqiad.wmnet server for the near future, but replication will soon stop there and all tables will be made read-only. User databases on c1.labsdb and c3.labsdb listed at https://tools.wmflabs.org/tool-db-usage/ will be going away on 2018-01-03. You will need to migrate these to tools.db.svc.eqiad.wmflabs if you need to save the data. TL;DR * Change your tools and scripts to use: - "*.web.db.svc.eqiad.wmflabs" (real-time response needed) - "*.analytics.db.svc.eqiad.wmflabs" (batch jobs; long queries) * Replace "*" with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki). * The new servers do not support user created databases/tables because replication can't be guaranteed. See T156869 and below for more information. * Migrate your user created tables to tools.db.svc.eqiad.wmflabs (also known as tools.labsdb) and JOIN via application space logic rather than in-process in the database. What is changing? * Wednesday 2017-12-13 ** "*.labsdb" service names switched to point at "*.web.db.svc.eqiad.wmflabs" equivalents. ** User created tables will not be allowed on the new servers. ** "c3.labsdb" still points at labsdb1003.eqiad.wmnet * Thursday 2017-12-14 ** DBAs will stop replication from production hosts to labsdb1003.eqiad.wmnet ** DBAs will make databases on labsdb1003.eqiad.wmnet read-only for all users * Wednesday 2018-01-03 ** labsdb1001.eqiad.wmnet (aka c1.labsdb) will be shutdown permanently ** labsdb1003.eqiad.wmnet (aka c3.labsdb) will be shutdown permanently Why are we doing this? See <https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown> and <https://phabricator.wikimedia.org/T142807> for a more complete description of the reasons for these changes. Bryan (on behalf of the Wikimedia Cloud Services and DBA teams) -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 5 months

1
0
0 0

Partial toolforge outage ongoing

by Andrew Bogott

Hello all, Some tools running on the Toolforge Kubernetes cluster are currently suffering from network failures. It's not yet fully diagnosed, although we have some ideas as to how to at least reduce the impact. The tracking bug is https://phabricator.wikimedia.org/T182722. We'll send another update when we have more information and/or when things are resolved; in the meantime no action is required on your part as we'll most likely restart affected tools and services ourselves as part of fixing the problem. Sorry for the downtime! -Andrew + the WMCS team

6 years, 5 months

1
1
0 0

Ubuntu trusty now deprecated for new WMCS instances

by Andrew Bogott

As discussed previously in this list [1] and on phabricator [2], I've just removed the Ubuntu Trusty image as a default option when creating new VMs. This is part of a longterm foundation-wide process to standardize on Debian as the distribution of choice. Existing Trusty VMs are unaffected by this change, as are present ToolForge workflows. WMCS operators still have the ability to create Trusty VMs in a pinch, so if you need one please create a phabricator task with an explanation of what you need and why and we'll create it as soon as we're able. -Andrew [1] https://lists.wikimedia.org/pipermail/cloud/2017-October/000056.html [2] https://phabricator.wikimedia.org/T161899

6 years, 6 months

1
0
0 0

Re: [Cloud-announce] [Cloud] c1.labsdb (labsdb1001) hardware failure

by Bryan Davis

On Thu, Nov 2, 2017 at 6:13 PM, Maximilian Doerr <maximilian.doerr(a)gmail.com> wrote: > Can you provide a list of tools/users impacted by the drive failure? Or is there a redundant drive covering for this? As long as c1 stays up, <https://tools.wmflabs.org/tool-db-usage/> will show the users with user-owned databases there. These users should have all also received a MassMessage spam from me on their Wikitech talk page about a week ago. There is no drive or data redundancy for user-created tables on c1.labsdb or c3.labsdb. The tools.db.svc.eqiad.wmflabs databases however are replicated to a secondary server. See <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…> Bryan -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 6 months

1
0
0 0

c1.labsdb (labsdb1001) hardware failure

by Bryan Davis

TL;DR: * c1.labsdb (labsdb1001.eqiad.wmnet) is down due to hardware issues * *.labsdb are pointing to c3.labsdb (labsdb1003.eqiad.wmnet) The physical server behind c1.labsdb (labsdb1001.eqiad.wmnet) experienced a hard drive failure around 2017-11-01T03:30 UTC. This failure is preventing the MySQL service on that host from starting. The *.labsdb service names that were pointed at that server have been updated to point to c3.labsdb (labsdb1003.eqiad.wmnet) instead. See <https://phabricator.wikimedia.org/T179464> for more information and additional updates. Expect slower than normal performance as all traffic is handled by a single server. Now would be a great time to update the configuration for your tools to use the new database cluster [0][1]. [0]: https://phabricator.wikimedia.org/phame/post/view/70/new_wiki_replica_serve… [1]: https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown Bryan -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 6 months

1
1
0 0

Wiki Replica c1.labsdb to be rebooted Monday 2017-10-30 14:30 UTC

by Bryan Davis

labsdb1001.eqiad.wmnet (aka c1.labsdb) will be rebooted at 2017-10-30 14:30 UTC for kernel updates (<https://phabricator.wikimedia.org/T168584>). Normal usage of the *.labsdb databases should experience only limited interruption as DNS is changed to point to the labsdb1003.eqiad.wmnet (aka c3.labsdb). The c1.labsdb service name will *not* be updated however, so tools hardcoded to that service name will be interrupted until the reboot is complete. There is a possibility of catastrophic hardware failure in this reboot. There will be no way to recover the server or the data it currently hosts if that happens. Tools that are hosting self-created data on c1.labsdb *will* lose that data if there is hardware failure. If you are unsure if your tool is hosting data on c1.labsdb, you can check at <https://tools.wmflabs.org/tool-db-usage/>. This reboot is an intermediate step before the complete shutdown of the server on Wednesday 2017-12-13. See <https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown> for more information. Bryan (on behalf of the Wikimedia Cloud Services and DBA teams) -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 6 months

1
0
0 0

Wiki Replica c1.labsdb and c3.labsdb to be shutdown 2017-12-13

by Bryan Davis

The labsdb1001.eqiad.wmnet (aka c1.labsdb) and labsdb1003.eqiad.wmnet (aka c3.labsdb) servers are being shutdown and permanently removed from service on Wednesday 2017-12-13. TL;DR * Change your tools and scripts to use: - "*.web.db.svc.eqiad.wmflabs" (real-time response needed) - "*.analytics.db.svc.eqiad.wmflabs" (batch jobs; long queries) * Replace "*" with either a shard name (e.g. s1) or a wikidb name (e.g. enwiki). * The new servers do not support user created databases/tables because replication can't be guaranteed. See T156869 and below for more information. * Migrate your user created tables to tools.db.svc.eqiad.wmflabs (also known as tools.labsdb) and JOIN via application space logic rather than in-process in the database. What is changing? * Week of 2017-10-30 to 2017-11-03 (exact date to be determined) ** Reboot labsdb1001.eqiad.wmnet (aka c1.labsdb) for kernel updates ** There is a possibility of catastrophic hardware failure in this reboot. There will be no way to recover the server or the data it currently hosts if that happens. * Week of 2017-11-06 to 2017-11-10 (exact date to be determined) ** Reboot labsdb1003.eqiad.wmnet (aka c3.labsdb) for kernel updates ** There is a possibility of catastrophic hardware failure in this reboot. There will be no way to recover the server or the data it currently hosts if that happens. * Wednesday 2017-12-13 * "*.labsdb" service names switched to point at "*.web.db.svc.eqiad.wmflabs" equivalents. * User created tables will not be allowed on the new servers "c1.labsdb" and "c3.labsdb" point to. * labsdb1001.eqiad.wmnet removed from service. * labsdb1003.eqiad.wmnet removed from service. Why are we doing this? See <https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown> and <https://phabricator.wikimedia.org/T142807> for a more complete description of the reasons for these changes. Bryan (on behalf of the Wikimedia Cloud Services and DBA teams) -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

6 years, 7 months

1
0
0 0

Do you need Ubuntu Trusty for new VMs?

by Andrew Bogott

Tool-forge users can ignore this email, it only concerns VPS project owners. Long ago, the Wikimedia Operations team made the decision to phase out use of Ubuntu servers in favor of Debian. It's a long, slow process that is still ongoing, but in production Trusty is running on an ever-shrinking minority of our servers. As Trusty becomes more of an odd duck in production, it grows harder to support in Cloud Services as well. Right now we have no planned timeline for phasing out Trusty instances (there are 238 of them!) but in anticipation of that phase-out we'd like to discourage the addition of new Trusty hosts to the cloud. Step one[1] is to prevent people from creating new Trusty images unless they really, really need them. We would like to remove Trusty from the default available list of base images and make it available for new VMs only via special request on phabricator. The questions for you are: 1) Would that change make your life a lot harder? 2) If yes, can you name a date after which it /won't/ make your life harder? If the loss of Trusty doesn't worry you, feel free to ignore this email. In the event of a silent (or relatively quiet) response, I'll pull Trusty from the default image list sometime in the next few weeks. - Andrew (+ the rest of the Cloud team) [1] https://phabricator.wikimedia.org/T161899

6 years, 7 months

1
0
0 0

Downtime for select VMs next week, 2017-10-04

by Andrew Bogott

In order to rebuild a server of questionable stability, I'm going to move the following instances on Wednesday: |+--------------------------+---------------------+--------+|| ||| Name | Tenant ID | Status | || ||+--------------------------+---------------------+--------+|| ||| cindy | pluggableauth | ACTIVE | || ||| deployment-kafka-jumbo-1 | deployment-prep | ACTIVE | || ||| oidc-google | pluggableauth | ACTIVE | || ||| proton-staging | reading-web-staging | ACTIVE | || ||| search-jessie | search | ACTIVE | || ||| smtp-test1 | project-smtp | ACTIVE | || ||| suggestbot-prod | suggestbot | ACTIVE | || ||| twlight-prod | twl | ACTIVE | || ||| twlight-staging | twl | ACTIVE | |||| ||| wikibrain-embeddings-02 | wikibrain | ACTIVE | || ||| wikikids | wmam | ACTIVE | || ||| zim-proto | mobile | ACTIVE | || ||+--------------------------+---------------------+--------+| Migration will cause the affected instances to be offline for some time (potentially more than an hour depending on the size of the instance) and rebooted. If you need me work on your server at a particular time of day, or need a stay of execution, please let me know. Otherwise I'll start going down the list at the beginning of my workday on Wednesday, around 14:00 UTC. Sorry for the inconvenience! -Andrew

6 years, 7 months

1
2
0 0

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-announce