Cloud-announce May 2018

cloud-announce@lists.wikimedia.org

3 participants
5 discussions

Rolling VM reboots next Wednesday, 2018-06-06, beginning at 14:00 UTC

by Andrew Bogott

As part of routine security maintenance, we'll be rebooting all VMs and virtualization hosts next Wednesday starting at 14:00 UTC (7AM San Francisco time). Toolforge users should be largely unaffected by this activity. Other projects (including deployment-prep) will experience sporadic downtime, a few minutes for each interruption. The entire process will take several hours. If you need a to-the-minute advance schedule for any particular reboot, please let me know and I'll put your system at the start. -Andrew + the cloud team

5 years, 11 months

ToolsDB maintenance

by Brooke Storm

ToolsDB will be undergoing maintenance and updates, Tuesday, June 5th at 1500 UTC to 1600 UTC. Actual outage times should be fairly brief, but during this time the database will be taken offline and the system rebooted. Due to the expected brief nature of the outage and the fact that some tables are not replicated (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups… <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…>), we are not planning on failing over to the replica at this time. Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm(a)wikimedia.org IRC: bstorm_

5 years, 11 months

Heads up tools/bots using Mono/.NET in Toolforge/GridEngine

by Arturo Borrero Gonzalez

We upgraded the Mono/.NET framework in Toolforge/GridEngine from the 3.x version to 5.x [0]. We discovered that some tweaking is required due to some weird behavior regarding memory allocation by the framework [1]. The first symptom you will see is your boot doing high CPU load (spins). The fix is easy, just telling Mono that more memory is available when running the tool/bot. But you require to cancel your job submissions and resend. Please refer to the phabricator bug [1] for more details. Sorry for the inconvenience. [0] https://phabricator.wikimedia.org/T194665 [1] https://phabricator.wikimedia.org/T195834

5 years, 11 months

Reduced support this week and next

by Andrew Bogott

Hello! The Cloud Services team is traveling quite a bit in the next few weeks: the Hackathon, the OpenStack Summit, and some personal travel. There will always be at least one person available for emergencies, but please be patient if we're slow to respond to requests. Everyone should be back by the first of the month. - Andrew + the Cloud Services team

5 years, 11 months

Upcoming WMCS network outages: Tuesday May 15th

by Andrew Bogott

As part of some long-deferred routine maintenance, we need to update (and, in one case, physically move) the network servers that handle all traffic between WMCS instances. During each change, all WMCS network traffic (including network access to all tools and VMs) will be interrupted for several minutes. The first outage will be: Tuesday, May 15 at 13:00 UTC The second outage will be three hours later: Tuesday, May 15 16:00 UTC In each case outages should last no more than ten to fifteen minutes. More details about this move can be found at https://phabricator.wikimedia.org/T193579 . -Andrew

5 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-announce May 2018