[Foundation-l] An update on localisation in MediaWiki (2009)

Ryan Lomonaco wiki.ral315 at gmail.com
Mon Jan 4 01:20:40 UTC 2010


Forwarded to the list on behalf of Siebrand Mazeland.

----

On 31 December 2007 and 1 January 2008 I sent an e-mail to which this is a
follow up[1,2].

First things first, because not everyone reads e-mails completely:
* MediaWiki localisation (that is the translation of English source messages
to other languages) depends on you! If you speak a language other than
English, care about your language in MediaWiki and Wikimedia and like
translating, go to http://translatewiki.net, register a user and start
contributing translations for MediaWiki and MediaWiki extensions. When your
localisation is complete, keep coming back regularly to re-complete it and
do quality control. Thank you in advance for all your contributions and
effort.
* The i18n and L10n area of MediaWiki requires continuous efforts. If this
area of FOSS has your interest: we need your help. Please offer your
development skills to further MediaWiki's i18n, L10n and translation
capabilities[3,4].

All statistics are based on MediaWiki 1.16 alpha, SVN version r60527 (31
December 2009). Comparisons are to MediaWiki 1.14 alpha, SVN version r45277
(1 January 2009).

See http://translatewiki.net/wiki/MediaWiki_2009 for a wiki version of this
message.

==Introduction==
* Localisation or L10n - the process of adapting the software to be as
familiar as possible to a specific locale (topic of this message)
* Internationalisation or i18n - the process of ensuring that an application
is capable of adapting to local requirements (out of scope of this message)

MediaWiki has a user interface definition for 362 languages (up from 348).
Of those languages at least 39 language codes are duplicates and/or serve a
purpose for usability[5]. Reporting on them, however, is not relevant. So
MediaWiki in its current state supports 323 languages (up from 322).
MediaWiki has 346 core language files (up from 326), of which 27 are
redirects from the duplicates/usability group or just empty[6]. So MediaWiki
has an active in-product localisation for 308 languages (up from 299).

The MediaWiki core product has several areas that can be localised:
* regular messages that can and should be localised (2,369 - up 9% from
2,168)
* optional messages that can be localised, which is mostly used for
languages not using a Latin script (187 - up 8% from 173)
* ignored messages that should not be localised (152 - up 2% from 149)
* namespace names and namespace aliases (17 - no change)
* magic words (142 - up 8% from 132)
* special page names (88 - up 2% from 86)
* other (directionality, date formats, separators, book store lists, link
trail, and others)

Localisation of MediaWiki revolves around all of the above. Reporting is
done on the regular messages only.

MediaWiki is more than just the core product. On
http://www.mediawiki.org/wiki/Category:All_extensions 1500 extensions (up
25% from 1200) have some kind of documentation. This analysis only takes the
code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk into
account. The source code repository contains give or take 445 extensions (up
25% from 370). Most extensions in the MediaWiki Subversion repository now
use the reference implementation for i18n. Currently 8,200 messages for
MediaWiki extensions can be localised in a consistent way (up 37% from
6,000).

==MediaWiki localisation in practice==
MediaWiki localisation has moved further to a centralised collaborative
process in translatewiki.net in the past year. Where in 2008 some wikis were
still translating in their own MediaWiki: namespace, the introduction of the
LocalisationUpdate extension[7], especially in the Wikimedia Foundation
wikis, has taken away the last hurdle for local translation against
centralised translation: instant gratification. Translations that are
committed to Subversion can be added to wikis without requiring software
updates, as often as desirable.

Little to no translations are submitted through the Bugzilla ticketing
system or directly by SVN committers. Exceptions are the localisations of
Hebrew, Cantonese, Simplified Chinese, Traditional Chinese, Classical
Chinese and Persian, that are still actively maintained in SVN, next to
regular contributors from the centralised system.

==The past, the present and the future==
MediaWiki localisation has always been a volunteer effort, and expect that
it will remain so. 2009 brought a successful Google Summer of Code project,
executed by Niklas Laxstrom [8,9] and the Wikimedia Foundation is supporting
the localisation that takes place at translatewiki.net[10]. Not only
MediaWiki, but all Open Source projects that are supported there[11] benefit
from these developments. We want to keep using the Translate extension
technology and expand on it, as well as nourish our translator base of
nearly 2,000 translators by providing them with better tooling and more
projects in 2010. Vereniging Wikimedia Nederland[12], the Dutch Wikimedia
Chapter has granted 2,000 Euro to Stichting Open Progress[13] for the
translatewiki.net Translation Rallies, that motivated its translators to
make more than 60,000 new translations for MediaWiki and its extensions in
August and December 2009.

New opportunities lie in better support of Translation Memory technology and
more supported projects to grow the community and allow the translators to
spend their time as productive as possible, while still allowing all the
socialising and collaboration features of MediaWiki. At the Google Summer of
Code Mentor Summit there was interest from the KDE Documentation
Project[14], the PHP Documentation Project, Pidgin, wxWidgets, and other
projects. For translatewiki staff this was a confirmation that our approach
works. The Translate extension however needs more development. If you want
to work on an exciting extension that makes a difference in multi language
support for Open Source software and MediaWiki content pages that require
structured translation, check out the Translate extension and help us make
it better. Your help *is* needed and most welcome!

The Wikimedia Strategic Planning process that is currently taking place also
allows for a broader perspective on the localisation of MediaWiki in a
Wikimedia context[15]. Support for several dozen MediaWiki extension in the
Wikia code repository is expected within the next few weeks. Wikimedia is,
or will soon be including a localisation score for language projects in
their statistics, so that in a year we expect to be able to analyse if
localisation is a requirement for a rise in usage or if it is a
consequence[16].

==MediaWiki localisation statistics==
Daily statistics for MediaWiki and extension localisation have been
available for the past two years[17]. For the past two years (arbitrary)
milestones have been set for four collections of MediaWiki related messages.
For the usability of MediaWiki in a particular language, the group 'core
most used' is the most important. A language must qualify for MediaWiki to
have 'minimal support' for that language in the first group. Reaching
further milestones indicates the maturity of a localisation:
* core most used (469): 98%
* core (2,369 messages): 90%
* Wikimedia extensions (2,700 messages): 90%
* extensions (8,200 messages): 65%

Currently the following numbers of languages have passed the above
milestones[18]:
* core most used: 147 (45.6% of supported languages - up 35% from 109 - goal
of 130 passed)
* core: 82 (21.1% of supported languages - up 21% from 68 - goal of 90
missed by 203 translations)
* Wikimedia extensions: 44 (13.6% of supported languages - up 22% from 36 -
goals of 50 missed by 1,500 translations)
* extensions: 39 (12.1% of supported languages - up 86% from 21 - goal of 30
passed)

I think the changes in the past year are very satisfying. MediaWiki
localisation has again improved enormously in the past year. Two of the four
goals I set in last years' e-mail have not been reached (only one of four
goals was reached for 2008). We nearly got there, though. Currently
MediaWiki core contains 377,394 messages (up 24% from 303,863 ultimo 2008).

==Conclusion==
So... Is MediaWiki doing well on localisation? Just like the past two years,
my personal opinion is that we do a proper job, but can still do a lot
better. After all, MediaWiki is the engine that runs a top 5 site in the
world committed to creating "a world in which every single human being can
freely share in the sum of all knowledge." Observing that there are also an
estimated hundred thousand MediaWiki installations out there, more than 250
Wikipedias that all use the Wikimedia Commons media repository, and that 147
languages out of 323 have a minimal localisation, there is a lot of room for
improvement; more realistically: the work will never be done, we the least
we can do is try to get there :).

Last year I mentioned languages from Africa performing way below average. I
am sad to conclude that this has not changed considerably. In an overview
with a weighted score for the localisation level of MediaWiki in a Wikimedia
context[19], the largest African languages have the lowest score (52 out of
100). Large languages spoken on multiple continents and large languages from
Europe are doing best (100 and 99 out of 100 respectively). Languages like
Oriya, Zulu, Burmese and Urdu are the large languages with the worst
localisation score. It is my personal aim to work towards an average L10n
score of 83 for the 50 largest languages in the world by the end of
September 2010.

We have all the tools to successfully localise MediaWiki into any of the
7,000 or so languages that have been classified in ISO 639-3. We only need
one person per language to make and effort and make it happen. Reaching the
first milestone (core most used) takes about six hours of work. Using
translatewiki.net or the Gettext file, little to no technical knowledge is
required. Knowledge of MediaWiki is a plus.

This was the pitch, basically the same as in 2007 and 2008, with even more
experience and data. Goals for MediaWiki localisation per end of 2010 are
ambitious, but still realistic with the right effort:
* core most used: 170 languages with 98% or more localised
* core: 105 languages with 90% or more localised
* wikimedia extensions: 65 languages with 90% or more localised
* extensions: 50 languages with 65% or more localised

I would like to wish everyone involved in any aspect of MediaWiki a
wonderful 2010.

Cheers!

Siebrand Mazeland

[1]
http://lists.wikimedia.org/pipermail/translators-l/2007-December/000571.html
[2]
http://translatewiki.net/wiki/User:Siebrand/An_update_on_localisation_in_MediaWiki_%282008%29
[3]
https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=Internationalization&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED
[4] http://translatewiki.net/wiki/User:Siebrand#Bugs
[5] als, be-x-old, ckb, crh, de-at, de-ch, de-formal, dk, en-gb, fiu-vro,
gan, got, hif, kk, kk-cn, iu, kk-kz, kk-tr, ko-kp, ku, ku-arab, nb, ruq,
simple, sr, tg, tp, tt, ug, zh, zh-classical, zh-cn, zh-sg, zh-hk,
zh-min-nan, zh-mo, zh-my, zh-tw, zh-yue
[6] als, be-x-old, bh, ckb, ckb-latn, crh, de-at, dk, en-rtl, fiu-vro, gan,
hif, hif-deva, ii, iu, kk, kk-cn, kk-kz, kk-tr, ko-kp, ks, ku, nb, pi, ruq,
simple, st, tg, tp, tt, ug, zh-classical, zh-cn, zh-min-nan, zh-mo, zh-my,
zh-sg, zh-yue
[7] http://www.mediawiki.org/wiki/Extension:LocalisationUpdate
[8]
http://socghop.appspot.com/gsoc/student_project/show/google/gsoc2009/wikimedia/t124025074637
[9] http://laxstrom.name/blag/2009/09/01/gsoc-wrap-up-translate-extension/
[10] http://techblog.wikimedia.org/2009/10/supporting-translatewiki-net/
[11] http://translatewiki.net/wiki/Project_list
[12] http://nl.wikimedia.org
[13] http://www.openprogress.org/Stichting_Open_Progress
[14] http://translatewiki.net/wiki/Project:KDE_Documentation
[15] http://strategy.wikimedia.org/wiki/Localisation
[16] http://stats.wikimedia.org/EN/TablesCurrentStatusVerbose.htm
[17] http://translatewiki.net/wiki/Translating:Group_statistics
[18] http://translatewiki.net/wiki/Translating:Group_statistics_in_time
[19]
http://translatewiki.net/wiki/Project:MediaWiki_localisation_in_the_50_most_spoken_languages


More information about the foundation-l mailing list