[Foundation-l] Google Translate now assists with human translations of Wikipedia articles

Nikola Smolenski smolensk at eunet.yu
Wed Jun 10 11:42:58 UTC 2009


Amir E. Aharoni wrote:
> On Tue, Jun 9, 2009 at 23:42, Brian<Brian.Mingus at colorado.edu> wrote:
>> Google has built in support for using its machine translation technology to
>> help bootstrap human translations of Wikipedia articles.
>>
>> http://translate.google.com/toolkit/docupload
>>
>> The benefit to Google is clear - they need sentence-aligned text in multiple
>> languages in order to bootstrap their automated system.
>>
>> This is a great example of machines helping people help machines help
>> people, etc... I'm sure this is now the most efficient way to produce high
>> quality translations of Wikipedia articles en masse.
>>
>> We should take the ToS to make sure the translated text can be CC-BY-SA
>> licensed.
> 
> OK, after a bit of drama in this discussion, i actually tried this toolkit.
> 
> Then i tried to translate [[Art critic]] from English into Hebrew.
> There were a few pleasant surprises, but on the whole the machine
> translation was bad to the point of being unusable. It is much easier
> to translate it using vi.

I tried translating [[Astronomy]] and [[Eothyrididae]] (at least, the 
part of it that is in English) to Serbian and was pleasantly surprised. 
Sure, literally every sentence needed major corrections, but for me it 
was still much easier to do that than to translate from scratch.

> I *had* to make very deep changes to paragraph structure - not to
> mention sentence structure -, and not just because the Hebrew
> Wikipedia has a different MOS, but because it's the basis of the

This is then apparently the case of English→Hebrew translation working 
worse than English→Serbian (possibly due to Hebrew being a 
non-indo-european language)? I have never had to make any changes to 
paragraph structure, only occasionally changes to sentence structure 
(I'd say there were about 10% of sentences I had to change the structure 
of and another 10% that had uncommon structure but I let them slide).

 > Hebrew language. A text without these changes would be next to
> unreadable. I doubt that a document which is changed so deeply is very

While I would probably delete an article that would be dumped straight 
from a machine translation, I still find it fully understandable.

To illustrate:

Then i tried to translate [[Art critic]] from English into Hebrew.
There were a few pleasant surprises, but on the whole the machine
translation was bad to the point of being unusable. It is much easier
to translate it using vi.

translates to:

Tada sam pokušao prevesti [[umetnički kritičar]] sa engleskog na hebrejskom.
Bilo je nekoliko ugodnih iznenađenja, nego na ceo mašina
prevod je loš do tačke da je neupotrebljiva. To je mnogo lakše
prevesti preko VI.

I would retranslate this to broken English li:

Then i tried to translate [[Art critic]] from English into Hebrew's.
There were a few pleasant surprises, than on entire machine's
translation was bad to the point of being unusably. Much easier 
translated via VI.

and the correct would be (I highlighted the changes):

Tada sam pokušao prevesti [[umetnički kritičar]] sa engleskog na 
*hebrejski*.
Bilo je nekoliko ugodnih iznenađenja, *ali u celini* *mašinski*
prevod je loš do tačke da je *neupotrebljiv*. *Mnogo je* lakše
prevesti *ga* *pomoću vi-ja*.



More information about the foundation-l mailing list