[Labs-l] [Maintenance] Labs NFS storage

Marc A. Pelletier marc at uberbox.org
Tue Mar 24 19:29:25 UTC 2015


TL;DR: NFS will be slow for a few days then briefly unavailable on March
26, 2015 at 22:00 UTC (less than five minutes).

Tracked at: https://phabricator.wikimedia.org/T93792

== The good news ==

backups are coming (back) to the Labs storage, with snapshots into the
past.  In addition, we will replicate data to another datacenter, so
that there will be an available backup in case of disaster.

== The bad news ==

In order to finish moving project storage to the new filesystem that has
snapshots enabled, a copy needs to be performed to synchronize the (new,
not live) filesystem with the currently active one.

This means that for the next two days (estimated) starting at 22:00 UTC,
the performance of file I/O on the NFS server (for /data/project and
/home) will be noticably lower.  I will keep a close eye on the process
and try to balance the available resources so that the copy does not
take more than about half the disk bandwidth, but there will be a
noticable increase in latency for all file operations on that filesystem.

== The switchover ==

Labs instances are tentatively scheduled to be moved to the new
filesystem on March 26, 2015 at 22:00 UTC.  At that point, there will be
a brief (<2 minutes) interruption during which file operations will be
moved from one filesystem to the other.  This will be confirmed at least
24h in advance.

File operations in effect during that brief outage will be unavoidably
interruped and currently opened files will be forcibly closed.  They can
be reopened immediately afterwards, but running processes may error out
because of this.

To avoid possible issues with running jobs (including webservices) in
tool labs, all running jobs will be rescheduled and restarted at that
time.  Jobs that run at interval through crontabs should not be affected
unless they were scheduled to run exactly at the time of the outage.

The older copy of the data will be kept around for several week, so if
anything went wrong in the copying process they will be preserved and
can be restored.

== What you can do ==

If you have directories the contents of which are not worthwhile to back
up (caches, easily regenerated data, backups) you may add a file at
their root to control whether they are copied (and what is copied).  The
file needs to be named '.nobackup' and follow the rsync filter rules.
(You can get a detailed explanation of the rules in the rsync manpage
under the 'FILTER RULES' section).

tl;dr: If all you need to do is exclude a directory entirely, then you
only need to put "- *" in the file at the top of that directory (a dash,
a space, and an asterisk).

Doing so will improve the speed at which backups of your data are taken,
and noticably reduce the performance impact.  This only affects backups
intended for data recover - the snapshot process so that local
time-based backups of your files remain available.

Please take a moment to do so, especially if the directories contain
many (over 10000) files.

-- Marc



More information about the Labs-l mailing list