The NFS servers used for scratch and maps mounts (/data/project and /home in the maps project and /data/scratch in other projects) will be going offline for a short time tomorrow 2021-07-01 at around 1600 UTC to move the mounts to DRBD synced volumes. The current setup causes odd issues during failover including data loss and stale files left behind. The process taking place is one of those failovers so there may be some files that were previously deleted that need deleting again present and similar anomalies.
I plan to reboot the maps project servers to make sure they have their mounts and processes restored as best as possible. The scratch mounts should be less impactful. If you use scratch, just be aware that it will go offline for a bit and will be back with some possible quirks. After that, the data should become far more stable and properly synced between the two systems. The process could start later than 1600 UTC if there are sync issues initially as I try to get as much of the data as possible transferred.
More details here https://phabricator.wikimedia.org/T224747 <https://phabricator.wikimedia.org/T224747>
Brooke Storm
Staff SRE
Wikimedia Cloud Services
bstorm(a)wikimedia.org
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
Next Tuesday we will be upgrading Kubernetes on toolforge. As part of
the upgrade we will need to restart all pods. This will produce a brief
interruption in web services and other tools that use kubernetes.
Assuming your services are able to survive a restart, no action should
be needed on your part. I'll send a further email when the upgrade is
finished.
Special thanks to volunteer Taavi (aka Majavah) who has been essential
in preparing for this upgrade and will be taking time out of his day to
make sure the upgrade goes smoothly on Tuesday.
-Andrew + the WMCS team
_______________________________________________
Cloud-announce mailing list -- cloud-announce(a)lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.…
Hello,
If you don't do anything with metadata fields of file tables (image table
for example) in replicas, you can ignore this email.
"image" table in Wikimedia Commons is extremely big (more than 380GB
compressed) and has been causing multiple major issues (including an
incident recently). Deep inspections revealed that more than 80% of this
table is metadata of PDF files, around 10% is metadata of DjVu files and
the 10% left is the rest of the information. This clearly needs fixing.
The work has been done on this by Tim Starling and we are slowly rolling
out two major changes:
First, format of metadata in the database (for example img_metadata field
in image table) will change for all files. It used to be php serialization
but it will be changed to json. You can see an example of before and after
in https://phabricator.wikimedia.org/T275268#7178983 Keep it in mind that
for some time this will be a hybrid mode that some files will have it in
json format and some will have it in php serialization. You need to support
both formats for a while if you parse this value.
Second, some parts of metadata for PDF and later DjVu files won't be
accessible in Wikimedia Cloud anymore. Since these data will be moved to
External Storage and ES is not accessible to the outside. It's mostly OCR
text of PDF files. You can still access them using API
(action=query&prop=imageinfo).
Nothing to the outside users will change, the API will return the same
result, the user interface will show the same thing but it would make all
of Wikimedia Commons more reliable and faster to access (by indirect
changes such as improving InnoDB buffer pool efficiency), improves time to
take database backups, enables us to make bigger changes on image table and
improve its schema and much more.
I hope no one heavily relies on the img_metadata field in the cloud
replicas but if you do, please let me know and reach out for help.
You can keep track of the work in https://phabricator.wikimedia.org/T275268
Thank you for understanding and sorry for any inconvenience.
--
Amir (he/him)
I've got a VM (spi-tools-host-1) that I'm not actively using right now. I could shut it down to release the CPU resources, but I want to be able to start it back up again at some point in the future with the same configuration. On AWS I would just back up all the storage and shut down the VM. Is there some way to do that in this environment?
I suspect the answer may include, "Yo, dummy, you should have used puppet instead of just installing things manually with apt-get". If that's the case, feel free to say that :-)
Or, is the VM so small that it's not worth worrying too much about the resources it's wasting?