Starting Thursday May 12th, 13:00 PDT ( 20:00 GMT ) we will be having the
first weekly Code Review office hours on freenode IRC in the
#wikimedia-codereview channel.
Event details: https://phabricator.wikimedia.org/E179
Background: https://phabricator.wikimedia.org/T128371
Thanks to everyone who's been helping to organize this. We would welcome
people to submit your patches for review as well as reviewers who can spare
a few minutes to provide feedback and hopefully merge some patches!
If you can't make it during the scheduled time period then please feel free
to suggest other times that would be better for you. I intend to set up one
or two other weekly time slots, at least one of which should be at a time
that's more convenient for people in Europe and Asia.
Looking forward to seeing you in #wikimedia-codereview
______________________
Mukunda Modell,
Release Engineer
Wikimedia Foundation
Trung Dinh wrote:
>Hi all,
>I have an issue why trying to parse data fetched from wikipedia api.
>This is the piece of code that I am using:
>api_url = 'http://en.wikipedia.org/w/api.php'
>api_params =
>'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rc
>dir=newer&format=json&rcstart=20160504022715'
>
>f = urllib2.Request(api_url, api_params)
>print ('requesting ' + api_url + '?' + api_params)
>source = urllib2.urlopen(f, None, 300).read()
>source = json.loads(source)
>
>json.loads(source) raised the following exception " Expecting ,
>delimiter: line 1 column 817105 (char 817104"
>
>I tried to use source.encode('utf-8') and some other encodings but they
>all didn't help.
>Do we have any workaround for that issue ? Thanks :)
Hi.
Weird, I can't reproduce this error. I had to import the "json" and
"urllib2" modules, but after doing so, executing the code you provided
here worked fine for me: <https://phabricator.wikimedia.org/P3009>.
You probably want to use 'https://en.wikipedia.org/w/api.php' as your
end-point (HTTPS, not HTTP).
As far as I know, JSON is always encoded as UTF-8, so you shouldn't need
to encode or decode the data explicitly.
The error you're getting generally means that the JSON was malformed for
some reason. It seems unlikely that MediaWiki's api.php is outputting
invalid JSON, but I suppose it's possible.
Since you're coding in Python, you may be interested in a framework such
as <https://github.com/alexz-enwp/wikitools>.
MZMcBride
I've been working on a little project that intercepts normal wiki links
from reloading the entire page and instead requests that page with ajax,
seperating out the content and replacing it on the page. So far, I've
gotten this to work with only the main content, however I need to also be
able to safely grab the 'mw-navigation' and 'footer' which seems to also be
standardized in most skins.
The only problem with that is client side JS cannot safely parse returned
pages for all three elements because it is possible for users to specify
replica elements with the same ID in-page, and DomDocument is not optimized
or well supported.
I would like to be able to grab all of the page content before its echoed
out, DomDocument in PHP, grab my elements, and echo out JSON of them. The
client side js plugin would then refresh those elements.
I've done a good bit of trial and error and research in the Mediawiki docs
for this, and it seems its not currently supported because the active skin
is the one that echoes out the entire page, not Mediawiki itself.
Am I wrong in my findings, and are there any plans to make Mediawiki handle
pages and echo them out? It would break standard currently, but I feel that
having skins build up an output string and passing it to Mediawiki rather
than delivering the content itself is a better approach.
Thank you for your time.
Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also
a native PHP extension wikidiff2, but we're not discussing it right now):
* DairikiDiff is what everybody uses, and
* Wikidiff3, and alternative implementation by Guy Van den Broeck that was
around for 8 years but required a configuration change
While less battle-tested, Wikidiff3 offers vastly improved performance on
heavy diffs compared to DairikiDiff. The price, however, is that it makes
certain shortcuts if the diff is too complex. I ran through 100K diffs from
English Wikipedia, and 6% of diffs were different. Lots of changes were
seemingly insignificant but I need your help with determining if it's
really so.
I've built this tool
<https://diff-forge.wmflabs.org/wiki/Special:DiffCompare>[1] to facilitate
the comparison. It displays two diffs from different algorithms side by
side (yeah, it can get too wide, I know:P). Which of them is which is
random. Parts with differences between the implementations are highlighted
in yellow. Below is the diff of differences for the reference. You can vote
with buttons above the diffs, no registration is required. If you see a
catastrophically bad diff please send me the link.
Unless the results are significantly worse, I'd like to go ahead and make
wikidiff3 the only implementation.
[1] https://diff-forge.wmflabs.org/wiki/Special:DiffCompare
--
Best regards,
Max Semenik ([[User:MaxSem]])
Hello!
MediaWiki-Codesniffer 0.7.1 is now available for use in your MediaWiki
extensions and other projects. 0.7.0 was a botched release that had
bugs, please don't use it. Here are the notable changes since the
last release (0.6.0):
* Also check for space after elseif in SpaceAfterControlStructureSniff
(Lethexie)
* Factor our tokenIsNamespaced method (addshore)
* Make IfElseStructureSniff can detect and fix multiple white spaces
after else (Lethexie)
* Make SpaceyParenthesisSniff can fix multiple white spaces between
parentheses (Lethexie)
* Make spacey parenthesis sniff work with short array syntax (Kunal Mehta)
* Speed up PrefixedGlobalFunctionsSniff (addshore)
* Update squizlabs/php_codesniffer to 2.6.0 (Paladox)
Thanks to some awesome work by Addshore, MW-CS is now twice as fast for
large repos like mediawiki/core!
This release also features contributions from GSoC student Lethexie who
will be working on improving our static analysis tools this summer!
-- Legoktm
https://www.mediawiki.org/wiki/Scrum_of_scrums/2016-05-04
= 2016-05-04 =
== Technology ==
=== Release Engineering ===
For all:
* T128190 - Migration of browsertests* Jenkins jobs to selenium* jobs
** The migration of browsertests* to selenium* is almost complete, however,
Željko needs people to claim their failing browser tests. See the task for
more information.
*** The task has a table, but it's not clear what you want people to do
(are you just asking about the rows with missing "contact from
browsertests.yaml" fields)?
For operations:
* T126594 - Disable HHVM fcgi server on CI slaves
* Need help from ops to review and merge these two patches (we dont need
HHVM running as a daemon on CI boxes):
** https://gerrit.wikimedia.org/r/#/c/269946/
** https://gerrit.wikimedia.org/r/#/c/269947/
* https://phabricator.wikimedia.org/T133911 - Bump quota of Nodepool
instances (contintcloud tenant)
** More instances needed. Clarified with Chase last week: pending Andrew.
No urgency.
* Two related tasks, each have patches that are needed to streamline the
scap3 migration:
** T133211 - Automate the generation deployment keys (keyholder-managed ssh
keys)
*** https://gerrit.wikimedia.org/r/#/c/284418/
** T132747 - scap::target shouldn't allow users to redefine the user's key
*** https://gerrit.wikimedia.org/r/#/c/285519/
=== Security ===
* Reviews:
** json-schema done, AuthManger done (no more comments, a few minor things
before all patches are +1'ed)
** Starting on T129584 this week
* Starting work on AuthService next week (heads up to Services, we'll
probably be scheduling a few meetings with you) (Marko: ack && yay!)
* Ops: ping again on T128819
=== Services ===
(Marko cannot attend today, sorry)
* RESTBase
** working on storing all auth checks locally (now we are calling the MW
API every time)
* EventBus / Change propagation
** started using it in production for the summary endpoint today
** more dependency updates to follow soon
* Cassandra move to 2.2.6 soon
** first up: maps cluster
* use Scap3 -
https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085299.html
=== Technical Operations
* '''Blocking''':
** none
* '''Blocked''':
** none
* '''Updates''':
** May 15 (Chrome SPDY removal deadline). Getting HTTP/2 working fully
deployed till then
** started using letsencrypt for various small services
== Product ==
=== Reading ===
* Most of Reading engineering is at an offsite today, I believe.
==== Reading Infrastructure ====
* AuthManager core patches are just waiting for security +1s. Work is
ongoing on extensions; CentralAuth, LdapAuthentication, ConfirmEdit could
use reviews if anyone is interested.
=== Editing ===
==== Collaboration ====
* '''Blocking''':
** External store work. External Store deployed everywhere on Beta with no
complications. Work on this continues. We now need to set up a second
External Store on Beta for Flow, to test the migration.
* '''Blocked''':
** Work on Flow dumps continuing on the ops side.
https://phabricator.wikimedia.org/T119511 and
https://phabricator.wikimedia.org/T89398 .
* '''Updates''':
** Continuing notification work on:
*** Cross-wiki notifications coming by default on May 12th!
*** Echo email formatter
*** Work continues on the new Echo MVC architecture
==== Parsing ====
* We got our first visual diff test run comparing Tidy with HTML5depurate.
Results @ http://mw-expt-tests.wmflabs.org/ ... Making notes @
https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy ... We will use this
as the basis for figuring out what things might break if we replace Tidy
today and what needs fixing and where.
* Scott has been working with Ops to get OCG kinks ironed out.
==== Language ====
* '''Blocking''':
** Apertium->Jessie. Yet to finalize plan and proceed.
* '''Blocked''':
* '''Updates''':
** cxserver service will be migrated to scap3 deployment soon.
** Translate (twn, meta,..) now using Apertium MT from cxserver.
== Discovery ==
* '''Blocking''': none
* '''Blocked''': none
* Preparing for ElasticSearch 2.0 migration
* Results for A/B test on portal language link location published:
https://commons.wikimedia.org/wiki/File:Wikipedia_Portal_Test_of_Language_D…
* TextCat A/B test launching soon
* Portal A/B test adding descriptions to project links to start this week
* WDQS redeployed, some performance issues
* Graphs have ability to use WDQS directly now
* Team offsite in 2 weeks (17-21)
==Analytics ==
*Scaling of pageview API , more work than anticipated
*Working new domain analytics.wikimedia.org, wikistats 2.0 migration,
meeting with research to map early states of project
*Still trouble with jenkins
*
== Fundraising Tech ==
* coding new PayPal integration
* pulling in lots of CiviCRM upstream changes
* making CentralNotice fail gracefully in odd cache edge cases
* Casey hanging out with reading offsite
* more work towards replacing ActiveMQ
Hi all,
I have an issue why trying to parse data fetched from wikipedia api.
This is the piece of code that I am using:
api_url = 'http://en.wikipedia.org/w/api.php<http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rclimit=5…>'
api_params = 'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rcdir=newer&format=json&rcstart=20160504022715<http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rclimit=5…>'
f = urllib2.Request(api_url, api_params)
print ('requesting ' + api_url + '?' + api_params)
source = urllib2.urlopen(f, None, 300).read()
source = json.loads(source)
json.loads(source) raised the following exception " Expecting , delimiter: line 1 column 817105 (char 817104"
I tried to use source.encode('utf-8') and some other encodings but they all didn't help.
Do we have any workaround for that issue ? Thanks :)
It has long been the case that the load.php entry point, to support more
aggressive caching, should not depend on details of the current session.
For example, it should not access the session user, and should use its own
'lang' parameter for i18n messages rather than the session user's language.
After a fair bit of work[1] by Gergő Tisza, Timo Tijhof, Bryan Davis, and
myself, we have now flipped the switch to make attempted session access
fail with an exception when serving load.php. This will go out to Wikimedia
wikis with 1.28.0-wmf.1 next week.[2]
We've had this in logging mode for a while and haven't seen any log
messages in production recently, so we anticipate that this change won't
cause any issues for Wikimedia wikis. Third-party wikis may have to update
their extensions.
If there are problems, they'll manifest as missing CSS or JavaScript in the
page, and when examining the load.php output there will be a comment at the
top reporting that a BadMethodCallException was caught. The logged
exception's message will be "Sessions are disabled for this entry point".
[1]: See https://phabricator.wikimedia.org/T127233 and its subtasks.
[2]: See https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap for the
schedule.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation