This is a quick reminder that this work on Hadoop will start in about 20
minutes' time.
Please refrain from launching any new jobs on the cluster and be aware
that the cluster will have decreased availability for up to a couple of
hours.
Kind regards,
Ben
On 11/01/2024 12:23 pm, Ben Tullis wrote:
*Scheduled downtime for Hadoop - Monday Jan 15th - 10:00 until 12:00 UTC*
Hello,
We need to perform some maintenance on our primary Hadoop cluster,
which will require a period of *downtime*. This work is scheduled for
*Monday Jan 15th - 10:00 until 12:00 UTC* - which is a US holiday for
WMF and also Wikipedia Day
<https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Day>.
This 2 hour maintenance window has been chosen in the hope of
minimising disruption for you, whilst the cluster and the various
tools that depend upon it, such as Superset and JupyterLab, are
largely unavailable.
The work being undertaken is a replacement of the Hadoop nameserver
hosts <https://phabricator.wikimedia.org/T332573> which,
unfortunately, requires a full cluster restart. We will be disabling
ingestion to HDFS, pausing Airflow DAGs on all instances, and stopping
production data processing pipelines, prior to the work, then
re-enabling them all afterwards. We are not expecting any gaps in
data, once the pipelines have caught up again.
If you have any queries or concerns about this work, or the time or
date is particularly in convenient for you, please don't hesitate to
let us know, so that we can look to reschedule.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>