On Wed, Jul 20, 2016 at 10:57 AM, Marko Obrovac <mobrovac(a)wikimedia.org>
wrote:
> Hello,
>
> I am happy to announce that Parsoid is now following our standard
> practices in production as it has been completely moved to use
> service-runner and service::node. We managed to do it in two days, and have
> not encountered any issues worth reporting. The transition went smoothly
> thanks to Giuseppe's mastery of the complete start-to-end process and
> Subbu's relentless testing. Thank you, guys, for your help!
>
Nice work!
Hello folks,
service-template-node v0.4.0 has just been released~[1]. The new version
represents an important security and feature upgrade from v0.3.2 and you
are urged to update as soon as possible~[2].
On the feature side, this release brings out-of-the-box support for sending
metrics for all requests made against a service, which means that after
upgrading you will be able to set up your own grafana dashboard with
relevant metrics~[3] very easily.
Security-wise, there were some possible RegEx exploits in one of the node
module dependencies. This has been mitigated by updating the relevant
modules to a version that does not have the deficiency. Additionally, from
now on a node-module-security scan is being run every time the service is
tested to ensure our infrastructure is kept safe.
Please update as soon as possible if you have a service based on the
service template running in WMF production. And, as always, should you have
any questions or concerns, feel free to reach out to me.
Cheers,
Marko
[1] https://github.com/wikimedia/service-template-node/tree/v0.4.0
[2] you can follow the guide on
https://www.mediawiki.org/wiki/ServiceTemplateNode/Updating
[3] dashboards a la
https://grafana.wikimedia.org/dashboard/db/restbase?panelId=16&fullscreen
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
Next Wednesday, 2016-07-13, we plan to upgrade the hosts running nodejs
services to v4.4.6 (from v4.3.0)~[1]. If you are a deployer of a node
service please ensure your service functions properly under said version.
In a node versioning sense, this is a minor version bump, so we don't
expect any breakage, but careful testing is still needed, especially for
services depending on binary node modules.
Once you've tested the service, please create a patch for its source repo
that bumps node's version to 4.4.6~[2] and add me (Mobrovac) as a reviewer
so that we can coordinate the switch properly.
Cheers,
Marko
[1] https://phabricator.wikimedia.org/T138561
[2] Follow the example for MCS - https://gerrit.wikimedia.org/r/#/c/297419/
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
FYI, the production log interface Kibana (aka logstash) is switching from
version 3 to 4, which are unfortunately incompatible. That means that you
will need to recreate them. The forwarded mail provides more detail.
Cheers,
Marko
---------- Forwarded message ----------
From: Bryan Davis <bd808(a)wikimedia.org>
Date: 29 June 2016 at 01:28
Subject: [Ops] Saving kibana dashboards before the Kibana4 upgrade
To: ops-l <ops(a)lists.wikimedia.org>
Erik B has been putting in a lot of work to prepare for upgrading the
Elasticsearch and Kibana services in the Logstash cluster. Things are
looking pretty good now for an upgrade sometime next week (2016-07-05
- 2016-07-08) [0].
As announced previously, saved Kibana dashboards are not portable from
our existing Kibana3 to the new Kibana4. The current plan is to copy
over to production all dashboards that have been created at
https://kibana4.wmflabs.org (beta cluster test server). I'll be doing
some work over the next couple of days to make sure the dashboards I
can't live without have been recreated there. I would suggest that
others do the same or ping me to see if I have time to port their
dashboard for them.
Creating a dashboard in Kibana4 is similar to Kibana3, but there is at
least one big difference. The "panels" from Kibana3 have been replaced
with "visualizations" in Kibana4. A visualization exists outside of a
dashboard, may or may not be associated with an existing saved query,
and must be made before it can be added to a dashboard. This takes a
bit of poking at to get comfortable with but all in all it's not a
horrible process. There's a video on elastic.co [1] that may or may
not be helpful in understanding how to port things over.
[0]: https://phabricator.wikimedia.org/T136001
[1]: https://www.elastic.co/blog/recreating-kibana-3-dashboards-in-kibana-4
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
_______________________________________________
Ops mailing list
Ops(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
There was an outage yesterday that took out the whole SCB cluster~[1]. Due
to the number of services hosted there, this was a user-facing event that
lasted approximately 20 minutes. More notes and thoughts are welcomed!
Cheers,
Marko
[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20160608-SCB
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
As of recently, there are two new commands that can be used by service
owners on their target nodes to inspect their service.
$ check-<service-name>
Issuing this command will trigger the nagios check which plays your
service's monitoring script.
$ tail-<service-name>
This command gives you the production logs in a human-readable format. I
added these commands to the appropriate documentation section on
Wikitech~[1].
Cheers,
Marko
[1]
https://wikitech.wikimedia.org/wiki/Services/Deployment#Dealing_with_Proble…
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
service-template-node v0.3.2~[1] has been released. This is an important
update for all services that contact the MW Action API or RESTBase, as it
brings a unified way of dealing with such requests. To make requests
against the MW API use apiUtil.mwApiGet() (from lib/api-utils.js), while
for RESTBase there is apiUtil.restApiGet(). To find out more, read the
documentation~[2] or look at the example routes~[3] where these utility
functions are used.
In particular, CXServer, Graphoid and MobileApps should update ASAP, as
these are the services already making requests to these entities. When you
update your service, please let me know as deploying the change will need a
coordinated config and code deploy.
Cheers,
Marko
[1] https://github.com/wikimedia/service-template-node/tree/v0.3.2
[2]
https://github.com/wikimedia/service-template-node/blob/v0.3.2/doc/coding.m…
[3]
https://github.com/wikimedia/service-template-node/blob/v0.3.2/routes/v1.js
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
tl;dr starting today:
Use:
scap sync 'message for posterity'
instead of:
scap 'message for posterity'
Scap 3.2.0-1 is now alive and well in production which means scap
subcommands are live.
All subcommands are documented[0]. Additional documentation can be
seen by running `scap --help` (or `scap [subcommand] --help`). If you
have any questions feel free to ask them on-list or in IRC on #scap3
or #wikimedia-releng.
Thanks!
Tyler Cipriani and the Deployment Working Group
[0]. Mediawiki:
https://doc.wikimedia.org/mw-tools-scap/scap2/commands.html Scap3:
https://doc.wikimedia.org/mw-tools-scap/scap3/deploy_commands.html
This is a change that affects services that have moved to deployment
via Scap3 (not MediaWiki deployments).
The 3.2.0-1 release that is currently live makes an important change
to the stages in which custom checks may be run. There is now a new
stage called `restart_service` that occurs after the `promote` stage.
The `promote` stage no longer does a service restart. This change is
outlined in the Scap3 docs[0].
This change likely means that you need to move any custom checks (in
scap/checks.yaml) that were intended to run post-service restart to
use the stage `restart_service` rather than `promote`.
For example this check, which depends on a service restart to work correctly:
checks:
service_responds:
type: command
stage: promote
command: curl -Ss localhost:1234
Should now be written as:
checks:
service_responds:
type: command
stage: restart_service
command: curl -Ss localhost:1234
Sorry for any inconvenience. For future releases, changelog highlights
will be sent to the list prior to release.
-- Tyler
[0]. https://doc.wikimedia.org/mw-tools-scap/scap3/quickstart/setup.html#service…
FYI for services deployers that have already switched to Scap3 and those
which will do so soon(TM). I will update the docs on wikitech accordingly
once this is live in production.
Cheers,
Marko
---------- Forwarded message ----------
From: Tyler Cipriani <tcipriani(a)wikimedia.org>
Date: 10 May 2016 at 19:08
Subject: [Engineering] [Breaking Change] Scap change for deployers
To: wikitech-l(a)lists.wikimedia.org, "Development and Operations engineers
(WMF only)" <engineering(a)lists.wikimedia.org>, ops(a)lists.wikimedia.org
tl;dr: Our beloved scap is changing to use subcommands rather than a
bunch of scripts, but the existing scripts will work for a short time.
Starting with the 3.2.0 release[0], which will hit production in the
next day or so, scap will use subcommands rather than using many
different scripts that all call the same underlying code. The scripts
(e.g., deploy, sync-file, sync-dir, sync-wikiversions.) will continue
to work as usual, but they will issue a deprecation warning until the
next release when they will disappear.
The most notable exception is the `scap` command which must be invoked
as `scap sync [message]`.
The docs are updated[1] and you can see new help output there or on
phabricator[2].
Long story short, you will now run:
scap sync-file <path> [message]
Instead of:
sync-file <path> [message]
This change has been cherry-picked on beta cluster and is currently live
there.
<3,
Tyler Cipriani and the Deployment Working Group
[0]. https://gerrit.wikimedia.org/r/#/c/287918
[1]. https://doc.wikimedia.org/mw-tools-scap/
[2]. https://phabricator.wikimedia.org/P3027
_______________________________________________
Engineering mailing list
Engineering(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation