Let me first try to explain to the list what Google is up to: They're
writing statistical translation software and they need material to
train it. "Statistical" means they look for a word or a phrase in a
piece of English text and try to find a match in a translated version
of the text. Then they see what translation occurred most often and
try to figure out what neighbouring words trigger what translation.
For example, their first attempt may reveal that "tap" can be
translated into two or more Afrikaans words ("kraan" and "tik"). Then
statistical analysis will reveal that the presence of the word "water"
near "tap" will significantly increase the probability of that
"kraan"
is the correct translation.
We must assume that Google is doing it for the money and they are
spending a lot of money on R&D. But they are not the only company
working on it, e.g. Babelfish. So if they reach a certain level in a
certain year, then their nearest rival may match that level a few
years later. And a decade later, someone may even publish an open
source app that achieves that level. (It's a very short time in the
context of a language that will exist for a millennium) So if we
create a body (or "corpus") of translated text, it will be used over
and over again.
All the competition and the rapidly falling price of computing power
will mean that the service will never cost more than a few cents per
word. Imagine sending an SMS to your domestic, having it translated
into her language with an ad at the end. If translating stuff on
wikipedia helps a little bit to speed it up, then I think it's a good
thing.
--
On the link that Achal sent, there is some discussion around the fact
that paying for contributions will reduce the quality. It may for
instance create an incentive for someone to copy from a copyrighted
source, or to start making things up. Fortunately those problems are
not really present when paying for translations, as long as there is
some degree of quality control e.g. by taking a sample of the result,
translating it back into English and comparing it with source.
On Sat, Sep 11, 2010 at 3:16 PM, Achal Prabhala <aprabhala(a)gmail.com> wrote:
Dear Dwayne,
I've followed the work of translate.org.za and congrats on everything
accomplished so far. Since you raised the issue of translation, I wanted
to point you to a robust discussion that happened recently in India
around Google's translation project. You can see archives of the
conversation at:
http://lists.wikimedia.org/pipermail/wikimediaindia-l/2010-April/thread.htm…
(Subj: Philosophical view on Google translated articles).
Reactions around google's translation have been mixed. The upsides are
clear, and the downsides (as expressed in that conversation) were:
- a dissonance between volunteer editors' contributions and the translations
- a lack of necessity or specificity to some of the translated articles
(marginal western figures who are unknown in, say, Tamil Nadu, etc.)
- some suspicion as to the motives behind the project (given google's
involvement)
- some broader questions, in terms of volunteer vs 'paid' editing and
what the spirit of editing Wikipedia is
In general, I think that translation, if cleverly applied in a
customised way, could be useful, and when applied badly could be
terrible - but, regardless, it's for the community to decide. Obviously,
given the mission of translate.org.za, you would come with a degree of
trust and acceptability that a corporation like google doesn't always
necessarily bring (which is not to imply that their project is
necessarily not helpful - at the moment, I believe various groups of
Indian wikipedians are going ahead with talks and discussions on the
trial). And when you talk about translation, have you had experience
with written material that goes beyond interfaces and templates? (if you
have, that experience might be useful to share). Also wondering if your
goal is to build a tool (like google) that is constantly improved
through human interaction and input, or to run the translation exercise
as a collaborative, human-input-based exercise?
Perhaps a good way to think about this is to ask if a particular
language community within the various South African wikipedias is
interested in taking you up on this. And then run some kind of
identification exercise - perhaps an ongoing project - where community
members deposit articles they'd like to have translated from X language
to Y language in a box. Articles from translation don't necessarily have
to come from a strong Wikipedia (like English) or even an emerging
Wikipedia like Afrikaans - they could well be within several smaller
Wikipedias, eg Sotho to Zulu, etc. Finally, in terms of how things are
translated, their quality, and style, I think this is where it is key to
get community members involved to minimise conflicts and maximise
usefulness of the end result.
Cheers,
Achal
_______________________________________________
WikimediaZA mailing list
WikimediaZA(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaza