(Corrected the date in the subject line from the previous notification.)
Next Tuesday on September 3rd, between 13:00 and 14:00 UTC we'll be
performing backend database maintenance on the OpenStack VPS control plane.
During this maintenance window the Horizon web dashboard will be
unavailable and all VPS requests to create, modify or delete VPS resources
like virtual machines and DNS entries will be blocked.
Existing VPS virtual machines will remain running and Toolforge users will
not be affected by this maintenance.
---
Wikimedia Cloud Services
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Today I rebuilt the Docker images that are used by the `webservice
--backend=kubernetes` command. This is actually a normal thing that we
do periodically in Toolforge to ensure that security patches are
applied in the containers. This round of updates was a bit different
however in that it is the first time the Debian Jessie based images
have been rebuilt since the upstream Debian project removed the
'jessie-backports' apt repo.
Everything should be fine, but if you see weirdness when restarting a
webservice or other Kubernetes pod that looks like it could be related
to software in the Docker image please let myself or one of the
Toolforge admins know by either filing a Phabricator bug report or for
faster response joining the #wikimedia-cloud IRC channel on Freenode
and sending a "!help ...." message to the channel explaining your
issue.
Bryan - on behalf of the Toolforge admins
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Next Tuesday on September 3rd, between 13:00 and 14:00 UTC we'll be
performing backend database maintenance on the OpenStack VPS control plane.
During this maintenance window the Horizon web dashboard will be
unavailable and all VPS requests to create, modify or delete VPS resources
like virtual machines and DNS entries will be unavailable.
Existing VPS virtual machines will remain running and Toolforge users will
not be affected by this maintenance.
---
Wikimedia Cloud Services Team
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Cross posting this call to action from the Analytics list. The Data
Lake data sets may be of interest to some tool builders here. This is
not a "real time" data set that would be good for patrolling
workflows, but it might be an interesting source of data for deeper
analysis of how articles have changed over time. Take a look at the
various links to Wikitech for more details on what data is in the
collection and how it is prepared.
If you have more questions I would encourage you to subscribe to the
<analytics(a)lists.wikimedia.org> list and discuss there to avoid Leila
and others having their good answers kept from the larger Analytics
and Research communities that this data set is initially aimed at
serving.
Bryan
---------- Forwarded message ---------
From: Leila Zia <leila(a)wikimedia.org>
Date: Tue, Aug 27, 2019 at 9:47 AM
Subject: [Analytics] [Input requested] Data Lake Edit release input request
To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
<analytics(a)lists.wikimedia.org>
[apologies for cross-posting]
In a nutshell:
We are asking for your input to help us learn how to release the
historical edit data of Wikimedia projects in a more efficient way.
Please provide your feedback via
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
by 2019-09-03.
******
Dear researchers and analytics users,
The Analytics team at Wikimedia Foundation [1] has been working on
building a data lake [2] for Wikimedia edits [3] to enable the
research and analysis of Wikimedia's edit data in a more efficient
way. This data is a history of activity on Wikimedia projects as
complete and research-friendly as possible. Edits have context, such
as whether they were reverted, in the same line as the edit itself. So
you can focus more on what you want to find out instead of writing
code to wrestle the data. Each line of the data released will include
the following and more (see full specification [3a], [3b], [3c]):
* editor edit count, groups, blocks, bot status, name, current and
historical (time of edit)
* seconds since this editor's last edit
* page context, current and historical (namespace, seconds since last
revision, etc.)
* seconds to identity revert or deletion, if applicable
* revision tags (mobile edit, ve edit, etc.)
The first instance of this data will be released in the coming months
and to make this release as useful as possible for you all, the users
of the data, the team needs to hear your thoughts on how to slice and
dice the data at publishing time. You can provide your input at
https://docs.google.com/forms/d/e/1FAIpQLScc15eSeFrVvAh_ydpX_1p0v6-WSx2qe3E…
.
Please provide your input to this survey no later than 2019-09-03.
Best,
Leila
[1] https://wikitech.wikimedia.org/wiki/Analytics
[2] https://en.wikipedia.org/wiki/Data_lake
[3] https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits
a) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_his…
b) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_use…
c) https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_pag…
--
Leila Zia
Principal Research Scientist, Head of Research
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
I an updating a tool I missed in the initial rounds of the actor/comment
table changes. This query used to run in ~30 seconds or so. Now its at a
staggering 10 minute run time. Anyone able to lend a hand on getting this
optimized?
select log_timestamp, actor_name, log_action, log_title, comment_text,
log_params
from logging_userindex
left join actor_logging on actor_id = log_actor
left join comment_logging on comment_id = log_comment_id
where log_type = 'block' and log_namespace = 2 and
log_title like '%s%%'
order by log_timestamp;
According to RFC 7231 § 3.1.1.5,[1] a POST request that does not include a
Content-Type header may be interpreted by the server in one of two ways:
1. It may assume application/octet-stream. In this case, PHP and the
Action API will not see the request as having any parameters, and so
will probably serve the auto-generated help page.[2]
2. It may "sniff" the content type. It's likely enough to correctly
guess application/x-www-form-urlencoded in this case, and therefore PHP and
the Action API will see the request as having the intended parameters.
It turns out that HHVM and PHP 7 (at least as used at Wikimedia) differ in
their behaviors: PHP 7 seems to choose option 1, while HHVM chooses option
2.
Thus, clients that have been generating POST requests to Wikimedia wikis'
Action APIs without a Content-Type header will have been receiving expected
results from HHVM but will now start receiving unexpected results as
Wikimedia's migration to PHP 7 proceeds.[3] Affected clients should be
updated to include the Content-Type header in their requests.
See https://phabricator.wikimedia.org/T230526 for some details on this
issue.
[1]: https://tools.ietf.org/html/rfc7231#section-3.1.1.5
[2]: As seen for example at https://www.mediawiki.org/w/api.php.
[3]: See https://phabricator.wikimedia.org/T176370 for progress on the
migration.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
_______________________________________________
Mediawiki-api-announce mailing list
Mediawiki-api-announce(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce
I have a question for the MySQL/MariaDB experts.
*Short and sweet*
How is that Query 1 <https://quarry.wmflabs.org/query/38237> runs in
seconds and so does Query 2 <https://quarry.wmflabs.org/query/38243>, but Query
3 <https://quarry.wmflabs.org/query/38244> -- which is essentially the same
thing except it tries to bring columns from both sides of the join -- takes
forever to run?
*Details*
I have a bot that uses this script
<https://github.com/PersianWikipedia/fawikibot/blob/master/HujiBot/findproxy…>
to identify and block IPs associated with open proxies. To be parsimonious
with the blocks, it only blocks the said IP and not its associated range.
However, many proxy IPs belong to a web hosting range and it would be
better to block the entire range. The goal of the query is to find all
active blocks made by my bot, sort them in order of IP address, and emulate
the LEAD() function -- which we still don't have because we have not
upgraded to MariaDB 10.2 on Labs servers -- to make it easy to find cases
where two consecutive IPs start with the same two octets (like 100.24.X.Y
and 100.24.C.D) so that I can manually investigate those in more detail.
The nested SELECT that is repeated twice simply generates a list of all
active blocks by my bot. Query 1 shows all of them (88 rows) and Query 2
shows only the rows in which the LEAD subquery row has a rownumber that is
equal to 1 + that of a row in the original 88-row data. Of note, this also
has 88 rows, though one of the rows is all NULLs because for the last row
of the data we should not find a match in the LEAD subqeury.
Anyhow, Query 3 simply aims to put the actual data and the LEAD subquery
data side by side and that is where things fall apart somehow. I cannot run
an EXPLAIN on this through Quarry, because Quarry does not like SQL
variables :/ and I have no idea how else to diagnose this problem.
Thanks!
Huji
Is it possible to run multiple web services on a single Toolforge tool? For
instance, is it possible to run a PHP service for any /api routes and a
node.js service for all other routes?
Or would we need to create two tools?
Thanks!
David Barratt