Hello everyone,

Upgrade, done. Cluster has been successfully upgraded to 1.23 and applications have just been redeployed. toolhub is operational again. 

On Fri, Mar 3, 2023 at 3:45 PM Alexandros Kosiaris <akosiaris@wikimedia.org> wrote:
Hello everyone,

TL;DR Toolhub will have a few hours of downtime due to maintenance on Tuesday 2023-03-07 Furthermore, if you are not deploying services to the eqiad wikikube kubernetes
cluster, you can safely skip the rest.

Long version:

We will reinitialize the eqiad wikikube kubernetes cluster using kubernetes
version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is expected
take a couple of hours within this window).
The date was chosen for convenience as due to the data center switchover process, eqiad is fully depooled, receiving almost 0 traffic. This is scheduled to change on 2023-03-08, making the process more difficult. As all traffic
has been drained already and we expect no visible impact. However, for the
duration of the process, the kubernetes cluster will be unavailable to
deployers and thus efforts to deploy to it will fail or worse, not have the
expected outcomes.
This is normal until SRE serviceops announces that the cluster is fully
operational again.

SRE serviceops will be deploying all services before marking the cluster as
usable so there will be no need for deployers to
re-deploy their services (apart from those already informed).

Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched over to codfw and is still being served from wikikube eqiad. Unavoidably, it will suffer a small downtime of a few hours. That is known and expected. To minimize that downtime, it will be prioritized during the initialization phase.

[1] https://phabricator.wikimedia.org/T331126

--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation


--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation