Hi C. Scott,
Information about the APIs to get the list of translations and the parallel
corpora (that includes examples of human translation, machine translation
and the corrections people did to them) is available at
https://www.mediawiki.org/wiki/Content_translation/Published_translations
People in the team more familiar with the technical details may provide
more details if needed.
With "a candidate "small wiki" that's been wanting to use
ContentTranslation", do you mean a wiki with heavy use of Content
Translation but lacking Machine Translation support? (I'm asking because
Content Translation is available in all wikis, although some lack automatic
translation support). The CX Stats page
<https://en.wikipedia.org/wiki/Special:ContentTranslationStats> can give
you an idea on how much Content Translation has been used for translation
on each wiki, and automatic translation support can be found here
<https://www.mediawiki.org/wiki/Content_translation/Machine_Translation>.
--Pau
On Fri, Sep 15, 2017 at 6:14 PM C. Scott Ananian <cananian(a)wikimedia.org>
wrote:
We're tracking source/destination pairs generated
by the
ContentTranslation tool, right? Could someone point me to that dataset?
(I'm playing around with some machine translation stuff to see if i can
prototype a suggester tool that would translate edits on wiki A to
corresponding edits on wiki B.)
--scott
PS. There's some cool work being done on "zero-shot translation"; aka
bootstrapping translation tools for small languages by pre-training them on
a related language (or even an unrelated language). Apparently that works!
(Cf
https://arxiv.org/pdf/1611.04558.pdf) It can greatly reduce the
amount of data required to build a translation model for the small language.
Is there a candidate "small wiki" that's been wanting to use
ContentTranslation which would be a good candidate for experimentation?
--
(
http://cscott.net)
_______________________________________________
Mediawiki-i18n mailing list
Mediawiki-i18n(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n
--
--
Pau Giner
Senior User Experience Designer
Wikimedia Foundation