Hi Everyone,
Over the last few months, the Wikimedia Developer Advocacy team has been
working to improve technical documentation for the MediaWiki Action API
<https://www.mediawiki.org/wiki/API:Main_page>.
So far, we have:
- Started efforts to revise, simplify, and reorganize the MediaWiki
Action API pages on MediaWiki using a new documentation template for
sub-pages: https://www.mediawiki.org/wiki/API:Documentation_template
- Updated the API navigation-template:
https://www.mediawiki.org/wiki/Template:API
As we continue to make improvements to the technical documentation, we
could use your help to better guide our efforts!
Would you please take a few moments to complete the following survey and
share your opinions and experiences with us?
https://goo.gl/forms/Y5PGILb6b3awC3OJ2
*Notes about the Mediawiki Action API Survey:*
*Survey Period: *December 6, 2018 - January 6, 2019
*Privacy Policy:* This survey will be conducted via a third-party service,
which may subject it to additional terms. For more information on privacy
and data-handling, see the survey privacy statement
https://foundation.wikimedia.org/wiki/MediaWiki_Action_API_Survey_Privacy_S…
.
Thanks for your participation!
Kindly,
Sarah R. Rodlund
Technical Writer, Developer Advocacy
<https://meta.wikimedia.org/wiki/Developer_Advocacy>
srodlund(a)wikimedia.org
Tomorrow I'll be moving the grid engine master node to a new virt host.
That will cause a 15-minute outage during which new jobs (crons, or
things submitted by hand) will fail.
Existing jobs or webservices will be unaffected by the downtime.
I'll start the move at 16:00 UTC on Friday, 2018-12-21. That's 8AM in
California.
-Andrew
Hi!
Tomorrow 2018-12-20 @ 17:00 UTC (~24h from now) we will be conducting
some network maintenance in Cloud VPS (openstack).
We will be doing some works on the transport network that connects the
Neutron server to the rest of the internet. Running CloudVPS instances
will see a brief connection problem if connected to any external service
(outside CloudVPS).
If everything goes fine, according to our tests all should be fine, all
operations will be finished in just a couple of minutes.
Let us know any issue you may find. Thanks.
Hello,
Today we have disabled BigBrother in Toolforge. BigBrother was a tool
that monitored continuous jobs that failed to get restarted because they
ran into corner cases where Grid Engine wasn't sufficiently smart to
re-start them (e.g. out of memory). BigBrother would continuously
monitor those jobs and duplicate that functionality on a layer above
Grid Engine.
Although very few tools used BigBrother (0.65% to be more precise), it
taxed our NFS file server constantly so keeping it around didn't make
much sense. Additionally, its functionality could be easily implemented
with a shell script running from cron.
So we've converted all tools that had a .bigbrotherrc file to using a
bigbrother.sh script that is triggered every 5min to restart jobs. If
your tool used BigBrother, please check your crontab (`crontab -l`) and
will see a few entries like this:
```
# Ensure continuous jobs are running
*/5 * * * * jlocal /data/project/tool_name/bigbrother.sh job_name job_script
```
Documentation has also been updated to reflect this change:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Grid#Bigbrother_(Depreca…
In our tests everything worked fine but please let us know if your
tool is being impacted by this change.
Regards,
--
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services
On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org <http://labstore1006.wikimedia.org/>). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org <http://labstore1007.wikimedia.org/>) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org <http://dumps.wikimedia.org/> (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
I recently noticed that some of our standard kvm/nova monitoring never
got copied over from the labvirt puppet code to the cloudvirt puppet
code. Tomorrow I will merge
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.
Once that patch is merged, icinga will be a bit touchier on the
cloudvirts. In particular, it will alert for any cloudvirt that has 0
VMs running on it. (This turns out to be a useful thing to watch for
because we've had cases where every single kvm process died at once.)
So, all 'idle' cloudvirts should nonetheless have a canary instance.
For example, on the new analytics cloudvirts I created canaries like this:
$ OS_PROJECT_ID=testlabs openstack server create --image
7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic
net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone
host:cloudvirtan1004 canary-an1004-01
Once a virt host is in full service we can leave the canaries there or
delete them -- there hasn't been any real consistent policy there.
In related news, I'm attempting to silence cloudvirt1019 and 1020
altogether with
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478115/ because
we reboot them twice a day and a reboot always kills any running VMs.
-Andrew
With any luck we'll have some more hardware installed by next week, so
it's time to move more projects! This is probably the last round of
bulk moves; after this it's all special cases for which I'll contact
people directly.
Tuesday, 2018-12-11: maps, wm-bot
Wednesday, 2018-12-12: mwoffliner, wildcat
Thursday, 2018-12-13: snuggle, services, commonsarchive, wikitextexp
Friday, 2018-12-14: queryrapi, wikidumpparse, wikistats, butterfly
Monday 2018-12-17: huggle, incubator, iiab, openrefine, wcdo,
wikidataconcepts
Tuesday 2018-12-18: wikimetrics, newsletter, telnet, signwriting,
ogvjs-ingetration
Wednesday 2018-12-19: multimedia, orig, security-tools, phragile,
wikistream, otrs, yandex-proxy
Thursday 2018-12-20: dashiki, etytree, partnermetrics, graphql
Some context for what this is all about can be found here:
https://phabricator.wikimedia.org/phame/post/view/120/neutron_is_here/
Please let me know if you are involved in one those projects and need to
postpone the move, or schedule a to-the-minute migration window.
- Andrew + the WMCS team