Hi,
I'm GET-ing the page/html/{title} endpoint at
https://en.wikipedia.org/api/rest_v1/ for information extraction. I'm
trying to nail down a polite request rate, and to determine whether the
current rate limit is likely to change soon.
- the doc at https://en.wikipedia.org/api/rest_v1/ pegs the rate limit at
200 req/s;
- on #wikimedia-services, +gwicke noted that the varnish cache's rate limit
is much lower -- around 100 req/s; but
- in practice, I get 429's whenever I exceed 70 req/s for more than a few
minutes.
Pchelolo suggested additional debug logs on 429's might help get to the
bottom of this lower-than-expected rate limit.
What kind of debugging info can I provide from my end? Any chance I'll be
able to hit the 200 req/s mark in the next few months?
Thanks,
Shahin
Google Code-in is an annual contest for 13-17 year old students. It
will take place from Nov28 to Jan17 and is not only about coding tasks.
While we wait whether Wikimedia will get accepted:
* You have small, self-contained bugs you'd like to see fixed?
* Your documentation needs specific improvements?
* Your user interface has small design issues?
* Your Outreachy/Summer of Code project welcomes small tweaks?
* You'd enjoy helping someone port your template to Lua?
* Your gadget code uses some deprecated API calls?
* You have tasks in mind that welcome some research?
Also note that "Beginner tasks" (e.g. "Set up Vagrant" etc) and
"generic" tasks are very welcome (e.g. "Choose & fix 2 PHP7 issues
from the list in https://phabricator.wikimedia.org/T120336 ").
Because we will need hundreds of tasks. :)
And we also have more than 400 unassigned open 'easy' tasks listed:
https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R
Would you be willing to mentor some of those in your area?
Please take a moment to find / update [Phabricator etc.] tasks in your
project(s) which would take an experienced contributor 2-3 hours. Check
https://www.mediawiki.org/wiki/Google_Code-in/Mentors
and please ask if you have any questions!
For some achievements from last round, see
https://blog.wikimedia.org/2017/02/03/google-code-in/
Thanks!,
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is
the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the
Crossref Event Data service, if you're interested -
https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions
from EventStreams. I'm using the RESTBase API because it gives the cleanest
HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fantasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link
rel="canonical"> tag, but the RESTBase response doesn't. It does supply an
isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki/Quebrada_Fantasma
"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official"
canonical URL that I would get from the HTML metadata (using https:)?
2 - Is the "title" component of the RESTBase URL the same as used in the
Canonical URL? The Swagger docs say "Page title. Use underscores instead of
spaces. Example: Main_Page". I'm not clear if that is the same thing.
3 - Is there a general recommended way of getting the canonical URL for a
page from RESTBase?
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian
Crossref
Hello folks,
Just a heads-up that the Services Team is having the team off-site this
week, so expect degraded response times. While we might be around IRC from
time to time, in case of emergency the best way to reach us is by sending a
mail to services(a)wikimedia.org .
Cheers,
The Services Team
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
Just a heads-up that I will be on holidays the week of 2017-08-28 through
2017-09-01. I will not be on IRC, but will be reachable by mail in case of
serious problems. For all other inquiries, please contact
services(a)wikimedia.org
Cheers,
Marko
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
David Mark Clements wrote an article describing how the new TurboFan
compiler will address many of the traditional V8 performance pitfalls
<https://github.com/petkaantonov/bluebird/wiki/Optimization-killers>:
https://www.nearform.com/blog/node-js-is-getting-a-new-v8-with-turbofan/
tl;dr: Performance of many "obvious" operations like try/catch and delete
will become more reasonable. Some of the performance work-arounds we
currently employ in perf-critical code (such as try/catch wrappers) won't
be needed any more, and might actually perform worse on TurboFan.
TurboFan is expected to land in Node 8.3 or 8.4. The Node 8 series is going
to be the next LTS release. We will upgrade to it once it has been LTS'ed.
--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
Hello,
We will be upgrading Node.JS in the WMF production environment to v6.11~[1]
next week. There should be no major changes or impacts for the services we
are running in production, but please be sure to test your service against
it and let us know if you spot any problems~[2].
Cheers,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
[1] https://nodejs.org/en/blog/release/v6.11.0/
[2] https://phabricator.wikimedia.org/T170548
Hi everyone!
Wikimedia is releasing a new service today: EventStreams
<https://wikitech.wikimedia.org/wiki/EventStreams>. This service allows us
to publish arbitrary streams of JSON event data to the public. Initially,
the only stream available will be good ol’ RecentChanges
<https://www.mediawiki.org/wiki/Manual:RCFeed>. This event stream overlaps
functionality already provided by irc.wikimedia.org and RCStream
<https://wikitech.wikimedia.org/wiki/RCStream>. However, this new service
has advantages over these (now deprecated) services.
1.
We can expose more than just RecentChanges.
2.
Events are delivered over streaming HTTP (chunked transfer) instead of
IRC or socket.io. This requires less client side code and fewer special
routing cases on the server side.
3.
Streams can be resumed from the past. By using EventSource, a
disconnected client will automatically resume the stream from where it left
off, as long as it resumes within one week. In the future, we would like
to allow users to specify historical timestamps from which they would like
to begin consuming, if this proves safe and tractable.
I did say deprecated! Okay okay, we may never be able to fully deprecate
irc.wikimedia.org. It’s used by too many (probably sentient by now) bots
out there. We do plan to obsolete RCStream, and to turn it off in a
reasonable amount of time. The deadline iiiiiis July 7th, 2017. All
services that rely on RCStream should migrate to the HTTP based
EventStreams service by this date. We are committed to assisting you in
this transition, so let us know how we can help.
Unfortunately, unlike RCStream, EventStreams does not have server side
event filtering (e.g. by wiki) quite yet. How and if this should be done
is still under discussion <https://phabricator.wikimedia.org/T152731>.
The RecentChanges data you are used to remains the same, and is available
at https://stream.wikimedia.org/v2/stream/recentchange. However, we may
have something different for you, if you find it useful. We have been
internally producing new Mediawiki specific events
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…>
for a while now, and could expose these via EventStreams as well.
Take a look at these events, and tell us what you think. Would you find
them useful? How would you like to subscribe to them? Individually as
separate streams, or would you like to be able to compose multiple event
types into a single stream via an API? These things are all possible.
I asked for a lot of feedback in the above paragraphs. Let’s try and
centralize this discussion over on the mediawiki.org EventStreams talk page
<https://www.mediawiki.org/wiki/Talk:EventStreams>. In summary, the
questions are:
-
What RCStream clients do you maintain, and how can we help you migrate
to EventStreams? <https://www.mediawiki.org/wiki/Topic:Tkjkee2j684hkwc9>
-
Is server side filtering, by wiki or arbitrary event field, useful to
you? <https://www.mediawiki.org/wiki/Topic:Tkjkabtyakpm967t>
-
Would you like to consume streams other than RecentChanges?
<https://www.mediawiki.org/wiki/Topic:Tkjk4ezxb4u01a61> (Currently
available events are described here
<https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema…>
.)
Thanks!
- Andrew Otto
It's worth noting the MCS is a collection of services used by the mobile
team . It includes endpoints such as `feed` (
https://en.wikipedia.org/api/rest_v1/#!/Feed). Why not put `summaries` in
there too?
On Thu, Jun 22, 2017 at 6:52 AM Sam Smith <samsmith(a)wikimedia.org> wrote:
> It's only just occurred to me that I've been making a serious mistake in
> conflating RESTBase and RESTful services, like MCS, in my recent
> communications up to and including my initial email.
>
> On Thu, Jun 22, 2017 at 2:43 PM, Marko Obrovac <mobrovac(a)wikimedia.org>
> wrote:
>
>> While it could be done in REDTBase as well, I think that this is not a
>> good long-term solution as it introduces a dependency on the Services team
>> for something that you ultimately own the output of.
>>
>
> This is an excellent point.
>
> With the above in mind, I think that the ideal solution is creating a new
> service for generating page summaries that can be consumed by multiple
> platforms. Just as with TextExtracts, page summaries are distinct from MCS.
> It's up to the Reading Web team to decide whether they want to implement it
> in Node or as a new MediaWiki API module that lives in the Popups extension.
>
> -Sam
>
> --
> IRC (Freenode): phuedx
> Timezone: BST (UTC+1)
>