Hello, I am investigating the 6-13-17 incident involving ores as a
volunteer for wikimedia AI, I was wondering if theres a way PDFrender can
interact less server resource wise when it comes to being on the same
server as ORES to keep ORES from failing, if not is it possible that either
ORES or PDFRender get moved to different servers perhaps? If you have any
questions, comments, or concerns please reply to me here or #wikimedia-ai
thanks!
Zppix
Volunteer developer for WMF
enwp.org/User:Zppix
Hello,
As of the release of service-runner v2.3.0~[1] earlier today, we are no
longer supporting Node.js v0.1x platforms. The minimum Node version needed
to power your services is now set at v4.2.2, but we encourage the library's
users to develop and run their services on Node v6.x, the current Node TLS
release.
If this change is affecting your services in a negative way, please let us
know here on-list or by filing a task in Phabricator against the
service-runner tag~[2].
Best,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
[1] https://github.com/wikimedia/service-runner/releases/tag/v2.3.0
[2] https://phabricator.wikimedia.org/project/board/1062/
Hi Marko,
I looked at the document and I would have multiple patches to it; alas, I
don't have time to go through it with the needed attention right now
because (as anticipated before the quarter) I have zero time to spend on
this.
Ops have some experience in running a cluster orchestration system in
production (for toollabs) and I have thought about it for quite some time
now; I have some ideas on how things should be done to have a decent,
manageable "elastic" environment with advantages for developers; I would
love to integrate your document with ideas/a more general vision about
production; this is probably not going to happen for at least one month
though.
Can we hold on before we declare this document to be "definitive"?
Also, can we stop calling it a "container-based" infrastructure? :) I
seriously think containers are little more than an implementation detail of
the general vision.
Cheers,
Giuseppe
On Wed, Feb 15, 2017 at 11:28 PM, Marko Obrovac <mobrovac(a)wikimedia.org>
wrote:
> Hello,
>
> In light of the upcoming annual planning for the joint technology goal of
> having a shared container-based infrastructure, the Services team has
> started collecting requirements for the platform in terms of development,
> testing and operation of services (together with some other considerations
> like automation and configuration management)~[1]. Please take a look at
> the document and add/remove/improve/suggest as you see fit. Note that the
> document is to be considered only a draft at this point.
>
> Cheers,
> Marko
>
> [1] https://docs.google.com/a/wikimedia.org/document/d/
> 1QsCVooqxkeE6tKYTxgoRvRdK2M3tDk4UyvmnHJrdag4/edit?usp=sharing
>
> --
> Marko Obrovac, PhD
> Senior Services Engineer
> Wikimedia Foundation
>
> _______________________________________________
> Ops mailing list
> Ops(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/ops
>
>
--
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation
Hello,
In the second half of January 2017, the version of Node running in WMF
production has been updated from v4 to v6~[1]. Parsoid, RESTBase, AQS and
the services running on the SCB cluster (Graphoid, Mathoid, Mobile Content
Service, to mention just a few) are now running on Node v6. The only
outstanding Node service which is still running Node v4 is Maps, but we are
making good progress on moving it to Node v6 and expect it to happen
soon~[2].
In the case of services running in WMF production, we have seen a slight
increase in performance as well as a substantial decrease in memory
consumption across the board, which improves the stability of our services.
You can read more about the experience of the update and its results in the
post published on Wikimedia's blog~[3].
Big thanks to all of the people that helped to make it happen!
Happy Friday,
Marko && the Services Team
[1] https://phabricator.wikimedia.org/T149331
[2] https://phabricator.wikimedia.org/T150354
[3] https://blog.wikimedia.org/2017/02/17/node-6-wikimedia/
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hi,
recently I saw some phabricator tickets that people complain about
having difficulties to setup the math extension and configure it to
get it working with restbase [1,2,3 ...].
I wonder if that also effects other service dependent extensions such
as cite or VE?
I am aware of the medium time goal to find a good and convinient
solution for this problem.
However, I think we should find a intermediate solution and provide
some documentation for average skilled admins of private wikis.
At least for Extension:Math that is relatively straigt forward, (even
though settig up a restbase instance seems like a slight overkill for
a wiki like pokewiki http://pokewiki.de/ that use only some formulae.)
The most difficult point during the setup is the creation of the
swagger config [4]. One of my students suggested that we should try
http://editor.swagger.io to simplyfiy the creation of the swagger
config.
Do you have experience with editing the services config files with
that editor, or do you use a different editor?
Best
physikerwelt
[1] https://phabricator.wikimedia.org/T155201
[2] https://phabricator.wikimedia.org/T151311
[3] https://phabricator.wikimedia.org/T119817
[4] https://raw.githubusercontent.com/physikerwelt/restbase-config-fse/master/f…
Is it considered acceptable now to produce a service or API that
hardcodes wiki-specific parsing of certain wikitext or HTML patterns in
certain wiki pages (such as the "On this day" section of the main page
of one wiki)?
I'm confused by the status of things and after my comment
https://phabricator.wikimedia.org/T143408#2919000 I see little effort on
finding solutions potentially able to scale to all our projects and
languages (which I assume to be the mission, see "globally" in
https://wikimediafoundation.org/wiki/Mission_statement ; please point it
out if this assumption is incorrect).
It might be that wiki-specific parsing hardcoded in MediaWiki/Wikimedia
code is actually able to scale, if written correctly; a comment on the
association patch seemed to imply so. This would be a very surprising
finding, and one which goes against 15 years of experience, so if we
have some examples or evidence of this it would be very worthwhile to
point them out.
Nemo
Hello,
I have taken this quiet (deployment) time to add automatic depooling and
repooling of services from the load balancers during deployments~[1]. From
now on, when you are doing a deploy, your service will be depooled from the
load balancers and repooled back after the service has been successfully
restarted on a target node. That means that during a deployment the current
target node (being deployed to) will not be in the pool of nodes responding
to requests, which in turn means that the number of failed requests due to
deployments will be nearly zero.
One thing will change from now on while you do the deployments. Because the
load balancers tolerate only 50% of nodes being depooled, deployments will
happen in groups of 2; instead of the usual two groups (canary, default),
there will now be 5 target groups (canary, default[1-4]). Once the
deployment of canaries finishes, you will see the prompt:
canary deploy successful. Continue? [y]es/[n]o/[c]ontinue all groups:
Here, you should start answering with a "c", which tells Scap3 to go
through all of the deployment groups. You can also answer "y", but then
Scap3 will repeat the question after each group.
I have added patches to the deploy repos of all of the service and tested
them. If you are interested in the patch for your respective service, see
the task~[1].
Happy holidays!
Cheers,
Marko
[1] https://phabricator.wikimedia.org/T144602
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation