All,
When we end up moving MW core to Phabricator I'd like us to jettison our
history. The
repo is large and clunky and not conducive to development. It's only going
to grow in
size unless we do something to cut back on the junk we're carrying around.
This is my ideal Phabby world:
mediawiki (no /core, that was always redundant)
mediawiki/i18n (as submodule)
mediawiki/historical (full history, previous + all mediawiki going forward)
If we jettison all our history we can get the repo size down to a 30-35MB
which is
very nice. Doing it on Gerrit isn't worthwhile because it'd basically break
everything.
We're gonna be breaking things with the move to Phab...it's then or never
if we're
going to do this.
Being able to stitch with the old history would be nice and I think might
be doable with
git-replace. If not, I still think it's worth discussing for developer and
deployer productivity.
Thoughts?
-Chad
My logging changes [0][1][2][3] are getting closer to being mergeable
(the first has already been merged). Tony Thomas' Swift Mailer change
[4] is also progressing. Both sets of changes introduce the concept of
specifying external library dependencies, both required and suggested,
to mediawiki/core.git via composer.json. Composer can be used by
people directly consuming the git repository to install and manage
these dependencies. I gave a example set of usage instructions in the
commit message for my patch that introduced the dependency on PSR-3
[0]. In the production cluster, on Jenkins job runners and in the
tarball releases we will want a different solution.
My idea of how to deal with this is to create a new gerrit repository
(mediawiki/core/vendor.git?) that contains a composer.json file
similar to the one I had in patch set 7 of my first logging patch [5].
This composer.json file would be used to tell Composer the exact
versions of libraries to download. Someone would manually run Composer
in a checkout of this repository and then commit the downloaded
content, composer.lock file and generated autoloader.php to the
repository for review. We would then be able to branch and use this
repository as git submodule in the wmf/1.2XwmfY branches that are
deployed to production and ensure that it is checked out along with
mw-core on the Jenkins nodes. By placing this submodule at $IP/vendor
in mw-core we would be mimicking the configuration that direct users
of Composer will experience. WebStart.php already includes
$IP/vendor/autoload.php when present so integration with the rest of
wm-core should follow from that.
It would also be possible to add this repo to the tarballs for
distribution. There will probably need to be some adjustments for that
process however and the final result may be that release branches
update the mediawiki/core composer.json and provide a composer.lock
along with a pre-populated vendor directory. I would be glad to
participate in discussions of that use case, but we will have about 6
months before we need to solve it (and a new release management RFC to
resolve between now and then).
There are several use cases to consider for the general solution:
== Adding/updating a library ==
* Update composer.json in mediawiki/core/vendor.git
* Run `composer update` locally to download library (and dependencies)
* Run `composer dump-autoload --optimize` to make an optimized autoloader.php
* Commit changes
* Push changes for review in gerrit
== Hotfix for an external library ==
At some point we will run into a bug or missing feature in a Composer
managed library that we need to work around with a patch. Obviously we
will attempt to upstream any such fixes (otherwise what's the point of
this whole exercise?). To keep from blocking things for our production
cluster we would want to fork the upstream, add our patch for local
use and upstream the patch. During the time that the patch was pending
review in the upstream we would want to use our locally patched
version in production and Jenkins.
Composer provides a solution for this with its "repository" package
source. The Composer documentation actually gives this exact example
in their discussion of the "vcs" repository type [6]. We would create
a gerrit repository tracking the external library, add our patch(es),
adjust the composer.json file in mediawiki/core/vendor.git to
reference our fork, and finally run Composer in
mediawiki/core/vendor.git to pull in our patched version.
== Adding a locally developed library ==
The Platform Core team has been talking about extracting libraries
from mw-core and/or extensions to be published externally. This may be
done for any and all of the current $IP/includes/libs classes and
possibly other content from core such as FormatJson.
My idea for this would be to create a new gerrit repository for each
exported project. The project repo would contain a composer.json
manifest describing the project correctly to be published at
packagist.org like most Composer installable libraries. In the
mediawiki/core/vendor.git composer.json file we would pull these
libraries just like any third-party developed library. This isn't
functionally much different than the way that we use git submodules
today. There is one extra level of indirection when a library is
changed. The mediawiki/core/vendor.git will have to be updated with
the new library version before the hash for the git submodule of
mediawiki/core/vendor.git is updated in a deploy or release branch.
== wmf/1.XwmfY branches ==
The make-wmf-branch script (found in mediawiki/tools/release.git) is
used to create the weekly release branches that are deployed by the
"train" on each Thursday. This script would be updated to branch the
new mediawiki/core/vendor.git repository and add the version
appropriate branch as a submodule of mediawiki/core.git on the wmf/*
branch. This is functionally exactly what we do for extensions today.
== Updating a deployment branch ==
SWAT deploys often deploy bug fixes for extensions and core that can't
wait for the next train release. It is a near certainty that
mediawiki/core/vendor.git will have the same need. The process for
updating mediawiki/core/vendor.git will be almost the same as updating
an extension.
* Follow the adding/updating library or hotfix instructions to get the
changes merged into the mediawiki/core/vendor.git master branch.
* Cherry-pick the change into the proper deployment branch
* Merge the cherry-pick
* Update the git submodule for mediawiki/core/vendor.git in the
appropriate deployed branch
* Pull update to tin
* sync-dir to deploy to cluster
== Security fixes ==
This is a special case of upstreaming a patch. A security patch would
be applied directly on the deployed branch of
mediawiki/core/vendor.git as we would do for any extension. The
vulnerability and patch must then be submitted upstream in a
responsible manner and tracked for resolution.
== Jenkins ==
The Jenkins jobs that checkout and run tests involving mediawiki/core
would need to be amended to also checkout the
mediawiki/core/vendor.git in the appropriate location before running
tests.
What use cases did I miss? What other concerns do we have for this process?
[0]: https://gerrit.wikimedia.org/r/#/c/119939/
[1]: https://gerrit.wikimedia.org/r/#/c/119940/
[2]: https://gerrit.wikimedia.org/r/#/c/119941/
[3]: https://gerrit.wikimedia.org/r/#/c/119942/
[4]: https://gerrit.wikimedia.org/r/#/c/135290/
[5]: https://gerrit.wikimedia.org/r/#/c/119939/7/libs/composer.json,unified
[6]: https://getcomposer.org/doc/05-repositories.md#vcs
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Should the Config and GlobalConfig classes and the associated
RequestContext methods be reverted from 1.23 as an incomplete feature?
As far as I can tell, it is not yet used anywhere, so reverting it
should be easy.
getConfig() was added to IContextSource in 101a2a160b05[1]. Then
the method was changed to return a new class of object (Config) instead
of a SiteConfiguration object in fbfe789b987b[2]; however, the Config
class faces significant changes in I5a5857fc[3].
[1]: https://gerrit.wikimedia.org/r/#/c/92004/
[2]: https://gerrit.wikimedia.org/r/#/c/109266/
[3]: https://gerrit.wikimedia.org/r/#/c/109850/
--
Kevin Israel - MediaWiki developer, Wikipedia editor
http://en.wikipedia.org/wiki/User:PleaseStand
Ori, thanks for following up.
I think I saw somewhere that there is a list of postmortems for tech ops disruptions
that includes reports like this one. Do you know where the list is? I tried a web search
and couldn't find a copy of this report outside of this email list.
I personally find this report interesting and concise, and I am interested in
understanding more about the tech ops infrastructure. Reports like this one
are useful in building that understanding. If there's an overview of tech ops
somewhere I'd be interested in reading that too. The information on English
Wikipedia about WMF's server configuration appears to be outdated.
Thanks,
Pine
> Date: Thu, 29 May 2014 22:38:10 -0700
> From: Ori Livneh <ori(a)wikimedia.org>
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
> Subject: Re: [Wikitech-l] 404 errors
> Message-ID:
> <CAHXK4ByYa8ae0EVGAUFWSCrjZtAQh+sjTW6ccJ14mB8o-teSoQ(a)mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Thu, May 29, 2014 at 1:34 PM, ENWP Pine <deyntestiss(a)hotmail.com> wrote:
>
> > Hi, I'm getting some 404 errors consistently when trying to load some
> > English Wikipedia articles. Other pages load ok. Did something break?
> >
>
> TL;DR: A package update went badly.
>
> Nitty-gritty postmortem:
>
> At 20:25 (all times UTC), change Ie5a860eb9[0] ("Remove
> wikimedia-task-appserver from app servers") was merged. There were two
> things wrong with it:
>
> 1) The appserver package was configured to delete the mwdeploy and apache
> users upon removal. The apache user was not deleted because it was logged
> in, but the mwdeploy user was. The mwdeploy account was declared in Puppet,
> but there was a gap between the removal of the package and the next Puppet
> run during which the account would not be present.
>
> 2) The package included the symlinks /etc/apache2/wmf and
> /usr/local/apache/common, which were not Puppetized. These symlinks were
> unlinked when the package was removed.
>
> Apache was configured to load configuration files from /etc/apache2/wmf,
> and these include the files that declare the DocumentRoot and Directory
> directives for our sites. As a result, users were served with 404s. At
> 20:40 Faidon Liambotis re-installed wikimedia-task-appserver on all
> Apaches. Since 404s are cached in Varnish, it took another five minutes for
> the rate of 4xx responses to return to normal (20:45).[1]
>
> [0]: https://gerrit.wikimedia.org/r/#/c/136151/
> [1]:
> https://graphite.wikimedia.org/render/?title=HTTP%204xx%20responses%2C%2020…
>
https://www.mediawiki.org/wiki/Manual:Config_script
says to delete the config directory. The instructions displayed within the config
script do not, and it the page is protected after install by the random
key. So, what is the correct instruction? Delete it? don't delete it?
delete it optionally for purpose X?
Hello all,
I would like to announce the release of MediaWiki Language Extension
Bundle 2014.05. This bundle is compatible with MediaWiki 1.22.7 and
MediaWiki 1.21.10 releases.
* Download: https://translatewiki.net/mleb/MediaWikiLanguageExtensionBundle-2014.05.tar…
* sha256sum: f53030ce7e6e0619f9a075877bc85423c0a28f46ffb296dbed5733502683b9b3
Quick links:
* Installation instructions are at: https://www.mediawiki.org/wiki/MLEB
* Announcements of new releases will be posted to a mailing list:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n
* Report bugs to: https://bugzilla.wikimedia.org
* Talk with us at: #mediawiki-i18n @ Freenode
Release notes for each extension are below.
-- Kartik Mistry
== Babel, CleanChanges and LocalisationUpdate ==
* Only localisation updates
== CLDR ==
* The fallback logic should not merge the time units from languages in
fallback chain. It should use time units from a fallback language only
when time units are not defined.
* Localisation updates.
== Translate ==
=== Noteworthy changes ===
* Add summary to the Special:MessageGroupStats page.
* The icon to hide the sidebar on Special:Translate is now clickable
in small screen sizes.
* When a user is promoted from translator sandbox, it is treated as
account creation. Promoted users can now receive NewUserMessage.
* When doing action=purge on Special:LanguageStats, the stats were not
completely purged until now.
== UniversalLanguageSelector ==
=== Noteworthy changes ===
* Bug 62342: Always display assistant languages in compact language
list when defined by user in Translate extension.
=== Input Methods ===
* Bug 63895: Updated Sanskrit Transliteration layout as per community request.
=== Fonts ===
* Bug 56939: Added 'Hussaini Nastaleeq' font for Urdu.
--
Kartik Mistry/કાર્તિક મિસ્ત્રી | IRC: kart_
{kartikm, 0x1f1f}.wordpress.com
FYI
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Fri, May 30, 2014 at 2:04 PM
Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Okay, the code is in place in the alphas of both the Android and iOS apps,
and the server-side 2% sampling (extra header in HTTPS request sent once
per cellular app session) is working.
https://git.wikimedia.org/commitdiff/apps%2Fandroid%2Fwikipedia.git/8b4a0c3…https://git.wikimedia.org/commitdiff/apps%2Fios%2Fwikipedia.git/59cde497921…https://git.wikimedia.org/commitdiff/mediawiki%2Fextensions%2FZeroRatedMobi…
Changes to event logging in the iOS alpha app (internal only at the moment,
although repo can be cloned and run in the Xcode simulator) are coming
pretty soon, and once those are in, we'll make one last tweak there to have
the app not add the extra MCC/MNC header on that single request per
cellular connection when logging is turned off in the iOS alpha app. That
part is done in the Android app already.
-Adam
On Fri, May 2, 2014 at 1:16 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
> Federico asked if sampling might make sense here. I think it will work, so
> I've updated the patchset.
>
> From a patchset comment I provided:
>
> "It's possible we may have situations where operators have not lots of
> users on them accessing Wiki(m|p)edia properties, so we do run some risk of
> actually missing IPs, even if exit IPs are concentrators of typically large
> sets of users. That said, let's try a 2% sample ratio; and if we find out
> it's insufficient, then we'll sample more, if it's oversampling, then we
> can adjust the other way, too. New patchset arriving shortly."
>
> (I've since submitted the updated code for review.)
>
> -Adam
>
>
>
> On Thu, May 1, 2014 at 7:52 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
>
>> After examining this, it looks like EventLogging is more suited to the
>> logging task than debug logging and the trappings of needing to alter debug
>> logging in the core MediaWiki software.
>>
>> EventLogging logs at the resolution of a second (instead of a day), but
>> has inbuilt support for record removal after 90 days.
>>
>> Please do let us know in case of further questions. Here's the logging
>> schema for those with an interest:
>>
>> https://meta.wikimedia.org/wiki/Schema:MobileOperatorCode
>>
>> Here's the relevant server code:
>>
>> https://gerrit.wikimedia.org/r/#/c/130991/
>>
>> -Adam
>>
>>
>>
>>
>> On Wed, Apr 16, 2014 at 2:20 PM, Adam Baso <abaso(a)wikimedia.org> wrote:
>>
>>> Great idea!
>>>
>>> Anyone on the list know if there's a way to make the debug log
>>> facilities do the YYYYMMDD timestamp instead of the longer one?
>>>
>>> If not, I suppose we could work to update the core MediaWiki code. [1]
>>>
>>> -Adam
>>>
>>> 1. For those with PHP skills or equivalent, I'm referring to
>>> https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba646….
>>> Scroll to the bottom of the function definition to see the datetimestamp
>>> approach.
>>>
>>>
>>> On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk
>>> > wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> One thought: you don't really need the date/time data at any detailed
>>>> resolution, do you? If what you're wanting it for is to track major
>>>> changes ("last month it all switched to this IP") and to purge old
>>>> data ("delete anything older than 10 March"), you could simply log day
>>>> rather than datetime.
>>>>
>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>>>>
>>>> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>>>>
>>>> - the latter gives you the data you need while making it a lot harder
>>>> to do any kind of close user-identification.
>>>>
>>>> Andrew.
>>>> On 16 Apr 2014 19:17, "Adam Baso" <abaso(a)wikimedia.org> wrote:
>>>>
>>>> > Inline.
>>>> >
>>>> > Thanks for starting this thread.
>>>> > >
>>>> > > Sorry if I've overlooked this, but who/what will have access to this
>>>> > data?
>>>> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
>>>> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
>>>> > > filters?
>>>> > >
>>>> >
>>>> > It's a good question. The thought is to put it in the customary
>>>> wfDebugLog
>>>> > location (with, for example, filename "mccmnc.log") on fluorine.
>>>> >
>>>> > It just occurred to me that the wiki name (e.g., "enwiki"), but not
>>>> the
>>>> > full URL, gets logged additionally as part of the wfDebugLog call; to
>>>> make
>>>> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
>>>> that's
>>>> > useful for purging old records. I'll forward this email to mobile-l
>>>> and
>>>> > wikitech-l to underscore this.
>>>> >
>>>> >
>>>> > > And this may be a silly question, but is there a reasonable means of
>>>> > > approximating how identifying these two data points alone are? That
>>>> is,
>>>> > > Using a mobile country code and exit IP address, is it possible to
>>>> > > identify a particular editor or reader? Or perhaps rephrased, is
>>>> this
>>>> > data
>>>> > > considered anonymized?
>>>> > >
>>>> >
>>>> > Not a silly question. My approximation is these tuples (datetime, now
>>>> that
>>>> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not
>>>> perfectly
>>>> > anonymized, are low identifying (that is, indirect inferences on the
>>>> data
>>>> > in isolation are unlikely, but technically possible, through
>>>> examination of
>>>> > short tail outliers in a cluster analysis where such readers/editors
>>>> exist
>>>> > in the short tail outliers sets), in contrast to regular web access
>>>> logs
>>>> > (where direct inferences are easy).
>>>> >
>>>> > Thanks. I'll forward this along now.
>>>> >
>>>> > -Adam
>>>> > _______________________________________________
>>>> > Wikimedia-l mailing list
>>>> > Wikimedia-l(a)lists.wikimedia.org
>>>> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
>>>> ,
>>>> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>> _______________________________________________
>>>> Wikimedia-l mailing list
>>>> Wikimedia-l(a)lists.wikimedia.org
>>>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>>>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>>
>>>
>>>
>>
>
I’m currently looking into the GeoData Extension to make location based Wikipedia queries.
There are still some open questions - would be nice if sb could provide guidance.
- The release status of the extension is still experimental. Is it safe to use it in production (mobile app)? Are there some hard limit how often I can query the API? Just thinking when the app gets popular…
- Is there a way to increase the search radius? E.g. When showing a continent (Europe) on a map, I would like to display articles for all countries (sth like `gsmindim` would be useful in this case too). I couldn’t find a way to do this other than making multiple queries for different coordinates which does not scale very well.
Thanks