Hi all!
Just an FYI here that this has been done, yay! Varnish, Nginx, and Squid frontends are now all logging with tab as the field delimiter.
For those who would notice, for the time being, we have started outputting logs to new filenames with .tab. in the name, so as to differentiate the format. We will most likely change the file names back to their original names in a month or so.
Thanks all!
-Andrew Otto
On Jan 28, 2013, at 11:33 AM, Matthew Flaschen <mflaschen(a)wikimedia.org> wrote:
> On 01/27/2013 08:07 AM, Erik Zachte wrote:
>> The code to change existing tabs into some less obnoxious character is dead
>> trivial, hardly any overhead. At worst one field will then be affected, not
>> the whole record, which makes it easier to spot and debug the anomaly when
>> it happens.
>>
>> Scanning an input record for tabs and raising a counter is also very
>> efficient. Sending one alert hourly based on this counter should make us
>> aware soon enough when this issue needs follow-up, yet without causing
>> bottle necks.
>
> Doing both of those would be pretty robust. However, if that isn't
> workable, a simple option is just to strip tab characters before
> Varnish/Squid/etc. writes the line.
>
> That means downstream code doesn't have to do anything special, and it
> shouldn't affect many actual requests.
>
> Matt Flaschen
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
This Sunday at FOSDEM I will have a lightning session at FOSDEM:
How to hack on Wikipedia
It is in fact an intro to MediaWiki & Wikimedia tech contributions,
designed to be reusable and customized by others for other occasions.
I just uploaded a new version at
https://commons.wikimedia.org/wiki/File:How_to_hack_on_Wikipedia.pdf
Still working on details & credits. Your feedback is welcome!
--
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
Fellow MediaWiki hackers!
After the pretty successful December release and some more clean-up work
following up on that we are now considering the next steps for Parsoid.
To this end, we have put together a rough roadmap for the Parsoid project at
https://www.mediawiki.org/wiki/Parsoid/Roadmap
The main areas we plan to work on in the next months are:
Performance improvements: Loading a large wiki page through Parsoid into
VisualEditor can currently take over 30 seconds. We want to make this
instantaneous by generating and storing the HTML after each edit. This
requires a throughput that can keep up with the edit rates on major
wikipedias (~10 Hz on enwiki).
Features and refinement: Localization support will enable the use of
Parsoid on non-English wikipedias. VisualEditor needs editing support
for more content elements including template parameters and extension
tags. As usual, we will also continue to refine Parsoid's compatibility
in round-trip testing and parserTests.
Apart from these main tasks closely connected to supporting the
VisualEditor, we also need to look at the longer-term Parsoid and
MediaWiki strategy. Better support for visual editing and smarter
caching in MediaWiki's templating facilities is one area we plan to look
at. We also would like to make it easy to use the VisualEditor on small
mediawiki installations by removing the need to run a separate Parsoid
service.
A general theme is pushing some of Parsoid's innovations back into
MediaWiki core. The clean and information-rich HTML-based content model
in particular opens up several attractive options which are discussed in
detail in the roadmap.
Please review the roadmap and let us know what you think!
Gabriel and the Parsoid team
--
Gabriel Wicke
Senior Software Engineer
Wikimedia Foundation
Since a few years ago, we have several query [special] pages, also
called "maintenance reports" in the list of special pages, which are
never updated for performance reasons: 6 on all wikis and 6 more only on
en.wiki. <https://bugzilla.wikimedia.org/show_bug.cgi?id=39667#c6>
A proposal is to run them again and quite liberally on all "small wikis"
(to start with); another, to update them everywhere but one at a time
and with proper breathing time for servers.[1]
The problem is, which pages are safe to run an update on even on
en.wiki, and how frequently; and which would kill it? Or, at what point
a wiki is too big to run such updates carelessly?[2]
Can someone estimate it by looking at the queries, or maybe by running
them on some DB where it's not a problem to test?
We only know that originally pages were disabled if taking "more than
about 15 minutes to update". If now such a page took, say, four times
that ie 60 min, would it be a problem to update one such page per
day/week/month? Etc.
Most updates seem to already rely on slave DBs, but maybe this should be
confirmed; on the other hand, writing huge sets of results to DB
shouldn't be a problem because those are limited as well.[3]
Nemo
[1] In (reviewed) puppet terms: <https://gerrit.wikimedia.org/r/#/c/33713/>
[2] Below that limit, a wiki should be "small" for
<https://gerrit.wikimedia.org/r/#/c/33694> and frequently updated for
the benefit of the editors' engagement.
[3] 'wgQueryCacheLimit' => array(
'default' => 5000,
'enwiki' => 1000, // safe to raise?
'dewiki' => 2000, // safe to raise?
),
Hello,
I changed the deployment instructions on http://wikitech.wikimedia.org/view/How_to_deploy_code, adding the "--recursive" flag to every invocation of "git submodule update". The GuidedTour extension (not currently deployed, but will be on Thursday) has a JavaScript dependency that is kept in a separate gerrit repository to facilitate submitting contributions and fix-ups upstream. I'd like to change another extension (EventLogging) to do the same. It seems like it'd be less error-prone to switch to updating submodules recursively by default.
If this is a Bad Idea for some reason that I have overlooked, please speak up.
--
Ori Livneh
Morning All,
Need some help / a duck [1].
Fundraising pushed some patches yesterday which introduced two new
namespaces; CNBanner and CNBanner_talk. Ideally these would only be present
on 'infrastructure' wikis that actually host banner content (so meta, test,
betameta) however I've been unable to figure out how to do this.
The root cause of my difficulty is that it appears that I am unable to
create new namespaces in post initialization hooks --
e.g. $wgExtensionFunctions. Am I missing something like a function call to
make post init creation of namespaces work; or is it just not supported
anymore [2]?
Alternatively; is there a method to 'hide' namespaces similar to how we can
'hide' special pages?
Any other random ideas?
If you're curious; we needed the new namespaces to support a 'scratch'
space for translations to happen before we moved them into the highly
protected MediaWiki namespace.
[1] http://en.wikipedia.org/wiki/Rubber_duck_debugging
[2] I had some code in 1.20 that created a namespace via a
wgExtensionFunctions hook that I recall working does not
--
~Matt Walker
Hi everybody,
I am happy to announce and invite you to the next Wikimedia Bugday:
Tuesday, January 29th, 17:00-23:00 UTC [1]
in #wikimedia-dev on Freenode IRC [2]
We are going to take a look at bug reports (excluding enhancement
requests) which have not seen any changes for more than one year, trying
to reproduce some plus provide some feedback.
Currently these are about 250 tickets (see [3] for the list).
In general, bugdays are about hanging out together on IRC chat,
discussing some reports in the software issues database at
https://bugzilla.wikimedia.org plus introducing interested people to
"triaging".
No technical knowledge needed, no obligations! It's a nice and easy way
to get involved in the community or to give something back.
Step by, say hello, and give it a try! :-)
For more information on Triaging in general, check out
https://www.mediawiki.org/wiki/Bug_management/Triage
See you around?
andre
[1] Timezone converter: http://www.timeanddate.com/worldclock/converter.html
[2] See http://meta.wikimedia.org/wiki/IRC for more info on IRC chat
[3] https://bugzilla.wikimedia.org/buglist.cgi?f1=longdescs.count&list_id=17420…
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
Whenever a file is linked to with a size specification, e.g.
[[File:test.png|thumb|123px]], a new thumbnail is generated in that
particular size, and saved to the disk.
This is generally a good thing, because it minimises the amount of data
the clients need to download without losing quality at that display size.
However, this is also an avenue for denial of service - someone could
create many links to different images with non-standard sizes,
intentionally or unintentionally, and therefore overload computational
(temporarily) and storage resources on the server.
Therefore, I propose an option which would either limit the number of
stored/generated thumbnails or limit their sizes to a particular set
(e.g. powers of two) - however, this should not come at a loss of
functionality.
Whenever an image link requests a size which can't be generated, for
whatever reason, either the next-largest or the next-smallest image is
sent, with relevant CSS styles to resize it in the browser. The decision
between next-largest and next-smallest would be governed by a
user-preferences option which would default to 25% i.e. send the smaller
image if the larger image is at least 75% larger than the target size
(this should probably use the thumbnail area for comparisons rather than
the width, if it's not a major performance hit).
This proposal is especially important for public deployments with large
amounts of (especially non-technical) users and/or tight limits on disk
space.
Hello,
Following a discussion with Daniel Kinzler this morning, we have raised
the default PHPUnit timeout from two seconds to ten seconds.
I originally thought that using PHPUnit timeout system was a good idea
but that is causing more harm than being helpful. So you should see
less "PHP_Invoker raised a timeout" errors :-D
Change: https://gerrit.wikimedia.org/r/#/c/46503/
--
Antoine "hashar" Musso