Wikitech-l May 2016

wikitech-l@lists.wikimedia.org

113 participants
118 discussions

Thursday: Get your patch reviewed during "Code Review office hours"

by Mukunda Modell

Starting Thursday May 12th, 13:00 PDT ( 20:00 GMT ) we will be having the first weekly Code Review office hours on freenode IRC in the #wikimedia-codereview channel. Event details: https://phabricator.wikimedia.org/E179 Background: https://phabricator.wikimedia.org/T128371 Thanks to everyone who's been helping to organize this. We would welcome people to submit your patches for review as well as reviewers who can spare a few minutes to provide feedback and hopefully merge some patches! If you can't make it during the scheduled time period then please feel free to suggest other times that would be better for you. I intend to set up one or two other weekly time slots, at least one of which should be at a time that's more convenient for people in Europe and Asia. Looking forward to seeing you in #wikimedia-codereview ______________________ Mukunda Modell, Release Engineer Wikimedia Foundation

8 years

Re: [Wikitech-l] garbage characters show up when fetching wikimedia api

by MZMcBride

Trung Dinh wrote: >Hi all, >I have an issue why trying to parse data fetched from wikipedia api. >This is the piece of code that I am using: >api_url = 'http://en.wikipedia.org/w/api.php' >api_params = >'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rc >dir=newer&format=json&rcstart=20160504022715' > >f = urllib2.Request(api_url, api_params) >print ('requesting ' + api_url + '?' + api_params) >source = urllib2.urlopen(f, None, 300).read() >source = json.loads(source) > >json.loads(source) raised the following exception " Expecting , >delimiter: line 1 column 817105 (char 817104" > >I tried to use source.encode('utf-8') and some other encodings but they >all didn't help. >Do we have any workaround for that issue ? Thanks :) Hi. Weird, I can't reproduce this error. I had to import the "json" and "urllib2" modules, but after doing so, executing the code you provided here worked fine for me: <https://phabricator.wikimedia.org/P3009>. You probably want to use 'https://en.wikipedia.org/w/api.php' as your end-point (HTTPS, not HTTP). As far as I know, JSON is always encoded as UTF-8, so you shouldn't need to encode or decode the data explicitly. The error you're getting generally means that the JSON was malformed for some reason. It seems unlikely that MediaWiki's api.php is outputting invalid JSON, but I suppose it's possible. Since you're coding in Python, you may be interested in a framework such as <https://github.com/alexz-enwp/wikitools>. MZMcBride

8 years

Ajaxify page requests extension

by Cyken Zeraux

I've been working on a little project that intercepts normal wiki links from reloading the entire page and instead requests that page with ajax, seperating out the content and replacing it on the page. So far, I've gotten this to work with only the main content, however I need to also be able to safely grab the 'mw-navigation' and 'footer' which seems to also be standardized in most skins. The only problem with that is client side JS cannot safely parse returned pages for all three elements because it is possible for users to specify replica elements with the same ID in-page, and DomDocument is not optimized or well supported. I would like to be able to grab all of the page content before its echoed out, DomDocument in PHP, grab my elements, and echo out JSON of them. The client side js plugin would then refresh those elements. I've done a good bit of trial and error and research in the Mediawiki docs for this, and it seems its not currently supported because the active skin is the one that echoes out the entire page, not Mediawiki itself. Am I wrong in my findings, and are there any plans to make Mediawiki handle pages and echo them out? It would break standard currently, but I feel that having skins build up an output string and passing it to Mediawiki rather than delivering the content itself is a better approach. Thank you for your time.

8 years

Diff algorithms: the shootout

by Max Semenik

Right now, MediaWiki has 2 pure-PHP engines to produce diffs (there's also a native PHP extension wikidiff2, but we're not discussing it right now): * DairikiDiff is what everybody uses, and * Wikidiff3, and alternative implementation by Guy Van den Broeck that was around for 8 years but required a configuration change While less battle-tested, Wikidiff3 offers vastly improved performance on heavy diffs compared to DairikiDiff. The price, however, is that it makes certain shortcuts if the diff is too complex. I ran through 100K diffs from English Wikipedia, and 6% of diffs were different. Lots of changes were seemingly insignificant but I need your help with determining if it's really so. I've built this tool <https://diff-forge.wmflabs.org/wiki/Special:DiffCompare>[1] to facilitate the comparison. It displays two diffs from different algorithms side by side (yeah, it can get too wide, I know:P). Which of them is which is random. Parts with differences between the implementations are highlighted in yellow. Below is the diff of differences for the reference. You can vote with buttons above the diffs, no registration is required. If you see a catastrophically bad diff please send me the link. Unless the results are significantly worse, I'd like to go ahead and make wikidiff3 the only implementation. [1] https://diff-forge.wmflabs.org/wiki/Special:DiffCompare -- Best regards, Max Semenik ([[User:MaxSem]])

8 years

Discovery Weekly Update for the week starting 2016-05-02

by Chris Koerner

Greetings, I hope you had a good week. Here are the updates from the Discovery department for the week. * Wikipedia.org portal A/B test for adding in descriptive text to the sister wiki links <https://phabricator.wikimedia.org/T131238> went live on May 3, 2016. * Updated statistics for wiktionary.org <https://phabricator.wikimedia.org/T128546> on May 3, 2016 * Released a few minor fixes to the Portal on May 3, 2016 * New version of Blazegraph and WDQS deployed, including geospatial search <https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Geospatia…> . * New completion suggester code was reverted from planned ElasticSearch 2.x release <https://lists.wikimedia.org/pipermail/discovery/2016-May/001085.html>, affecting our plans for runtime updates to suggester. * It is now possible to use SPARQL queries <https://phabricator.wikimedia.org/T126741> from interactive graphs <https://www.mediawiki.org/wiki/Extension:Graph/Demo>. * Deb attended the Reading department offsite this week. Initial response was that it was a great opportunity to learn and discovery new ways we can collaborate in the future. ---- Feedback and suggestions on this weekly update are welcome. The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

8 years

MediaWiki-Codesniffer 0.7.1 released

by Legoktm

Hello! MediaWiki-Codesniffer 0.7.1 is now available for use in your MediaWiki extensions and other projects. 0.7.0 was a botched release that had bugs, please don't use it. Here are the notable changes since the last release (0.6.0): * Also check for space after elseif in SpaceAfterControlStructureSniff (Lethexie) * Factor our tokenIsNamespaced method (addshore) * Make IfElseStructureSniff can detect and fix multiple white spaces after else (Lethexie) * Make SpaceyParenthesisSniff can fix multiple white spaces between parentheses (Lethexie) * Make spacey parenthesis sniff work with short array syntax (Kunal Mehta) * Speed up PrefixedGlobalFunctionsSniff (addshore) * Update squizlabs/php_codesniffer to 2.6.0 (Paladox) Thanks to some awesome work by Addshore, MW-CS is now twice as fast for large repos like mediawiki/core! This release also features contributions from GSoC student Lethexie who will be working on improving our static analysis tools this summer! -- Legoktm

8 years

2016-05-04 Scrum of Scrums meeting notes

by Grace Gellerman

https://www.mediawiki.org/wiki/Scrum_of_scrums/2016-05-04 = 2016-05-04 = == Technology == === Release Engineering === For all: * T128190 - Migration of browsertests* Jenkins jobs to selenium* jobs ** The migration of browsertests* to selenium* is almost complete, however, Željko needs people to claim their failing browser tests. See the task for more information. *** The task has a table, but it's not clear what you want people to do (are you just asking about the rows with missing "contact from browsertests.yaml" fields)? For operations: * T126594 - Disable HHVM fcgi server on CI slaves * Need help from ops to review and merge these two patches (we dont need HHVM running as a daemon on CI boxes): ** https://gerrit.wikimedia.org/r/#/c/269946/ ** https://gerrit.wikimedia.org/r/#/c/269947/ * https://phabricator.wikimedia.org/T133911 - Bump quota of Nodepool instances (contintcloud tenant) ** More instances needed. Clarified with Chase last week: pending Andrew. No urgency. * Two related tasks, each have patches that are needed to streamline the scap3 migration: ** T133211 - Automate the generation deployment keys (keyholder-managed ssh keys) *** https://gerrit.wikimedia.org/r/#/c/284418/ ** T132747 - scap::target shouldn't allow users to redefine the user's key *** https://gerrit.wikimedia.org/r/#/c/285519/ === Security === * Reviews: ** json-schema done, AuthManger done (no more comments, a few minor things before all patches are +1'ed) ** Starting on T129584 this week * Starting work on AuthService next week (heads up to Services, we'll probably be scheduling a few meetings with you) (Marko: ack && yay!) * Ops: ping again on T128819 === Services === (Marko cannot attend today, sorry) * RESTBase ** working on storing all auth checks locally (now we are calling the MW API every time) * EventBus / Change propagation ** started using it in production for the summary endpoint today ** more dependency updates to follow soon * Cassandra move to 2.2.6 soon ** first up: maps cluster * use Scap3 - https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085299.html === Technical Operations * '''Blocking''': ** none * '''Blocked''': ** none * '''Updates''': ** May 15 (Chrome SPDY removal deadline). Getting HTTP/2 working fully deployed till then ** started using letsencrypt for various small services == Product == === Reading === * Most of Reading engineering is at an offsite today, I believe. ==== Reading Infrastructure ==== * AuthManager core patches are just waiting for security +1s. Work is ongoing on extensions; CentralAuth, LdapAuthentication, ConfirmEdit could use reviews if anyone is interested. === Editing === ==== Collaboration ==== * '''Blocking''': ** External store work. External Store deployed everywhere on Beta with no complications. Work on this continues. We now need to set up a second External Store on Beta for Flow, to test the migration. * '''Blocked''': ** Work on Flow dumps continuing on the ops side. https://phabricator.wikimedia.org/T119511 and https://phabricator.wikimedia.org/T89398 . * '''Updates''': ** Continuing notification work on: *** Cross-wiki notifications coming by default on May 12th! *** Echo email formatter *** Work continues on the new Echo MVC architecture ==== Parsing ==== * We got our first visual diff test run comparing Tidy with HTML5depurate. Results @ http://mw-expt-tests.wmflabs.org/ ... Making notes @ https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy ... We will use this as the basis for figuring out what things might break if we replace Tidy today and what needs fixing and where. * Scott has been working with Ops to get OCG kinks ironed out. ==== Language ==== * '''Blocking''': ** Apertium->Jessie. Yet to finalize plan and proceed. * '''Blocked''': * '''Updates''': ** cxserver service will be migrated to scap3 deployment soon. ** Translate (twn, meta,..) now using Apertium MT from cxserver. == Discovery == * '''Blocking''': none * '''Blocked''': none * Preparing for ElasticSearch 2.0 migration * Results for A/B test on portal language link location published: https://commons.wikimedia.org/wiki/File:Wikipedia_Portal_Test_of_Language_D… * TextCat A/B test launching soon * Portal A/B test adding descriptions to project links to start this week * WDQS redeployed, some performance issues * Graphs have ability to use WDQS directly now * Team offsite in 2 weeks (17-21) ==Analytics == *Scaling of pageview API , more work than anticipated *Working new domain analytics.wikimedia.org, wikistats 2.0 migration, meeting with research to map early states of project *Still trouble with jenkins * == Fundraising Tech == * coding new PayPal integration * pulling in lots of CiviCRM upstream changes * making CentralNotice fail gracefully in odd cache edge cases * Casey hanging out with reading offsite * more work towards replacing ActiveMQ

8 years

garbage characters show up when fetching wikimedia api

by Trung Dinh

Hi all, I have an issue why trying to parse data fetched from wikipedia api. This is the piece of code that I am using: api_url = 'http://en.wikipedia.org/w/api.php<http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rclimit=5…>' api_params = 'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rcdir=newer&format=json&rcstart=20160504022715<http://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rclimit=5…>' f = urllib2.Request(api_url, api_params) print ('requesting ' + api_url + '?' + api_params) source = urllib2.urlopen(f, None, 300).read() source = json.loads(source) json.loads(source) raised the following exception " Expecting , delimiter: line 1 column 817105 (char 817104" I tried to use source.encode('utf-8') and some other encodings but they all didn't help. Do we have any workaround for that issue ? Thanks :)

8 years

Fall 2015 Tool Labs user survey data published

by Bryan Davis

[0]: https://meta.wikimedia.org/wiki/Research:Annual_Tool_Labs_Survey -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA irc: bd808 v:415.839.6885 x6855

8 years

load.php will now enforce the "no sessions" constraint

by Brad Jorsch (Anomie)

It has long been the case that the load.php entry point, to support more aggressive caching, should not depend on details of the current session. For example, it should not access the session user, and should use its own 'lang' parameter for i18n messages rather than the session user's language. After a fair bit of work[1] by Gergő Tisza, Timo Tijhof, Bryan Davis, and myself, we have now flipped the switch to make attempted session access fail with an exception when serving load.php. This will go out to Wikimedia wikis with 1.28.0-wmf.1 next week.[2] We've had this in logging mode for a while and haven't seen any log messages in production recently, so we anticipate that this change won't cause any issues for Wikimedia wikis. Third-party wikis may have to update their extensions. If there are problems, they'll manifest as missing CSS or JavaScript in the page, and when examining the load.php output there will be a comment at the top reporting that a BadMethodCallException was caught. The logged exception's message will be "Sessions are disabled for this entry point". [1]: See https://phabricator.wikimedia.org/T127233 and its subtasks. [2]: See https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap for the schedule. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation

8 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2016