[Foundation-l] [Wikitech-l] Primary account for single user login

Mark Williamson node.ue at gmail.com
Wed Apr 9 15:43:36 UTC 2008


Gerard, it does support the vast majority of Korean text with no issues.

The reason it does not support Japanese text is that it transliterates
between scripts without any attention to the language that is
represented.

http://demo.icu-project.org/icu-bin/translit (try selecting "Any" to "Latin")

On the other hand, Armenian and Georgian are straightforward alphabets
with easy and direct one-to-one correspondences to the Latin alphabet
(using a few diacritics). Kannada, Bengali, and Amharic can all also
be transliterated fairly easily using a short table of all symbols and
their appropriate conversions, with little context-specific processing
required. (at this point, Ethiopic is not supported by their system,
but I don't think it would be that difficult to create transliteration
tables for it, in fact I may already have them on my old computer from
when I was fiddling around with ICU years ago, as I believe I do with
Yi, Burmese, Lao, Ogham, and several other scripts).

The main downside to ICU is that it produces odd transliterations for
some scripts. The title of the main page at the Arabic Wikipedia, for
example, is rendered by ICU as "ạlṣfḥẗ ạlrỷysyẗ". Certainly, abjads
are difficult to transliterate, so the transliteration based on
symbols rather than sounds makes sense, but the symbols they used are
focussed on being able to roundtrip everything rather than on being
straightforward. I think it would be better to transliterate that as
"alsfht alryysyt", although that does leave the possibility of someone
registering a name with one letter that is different in Arabic but
would appear identical in transliteration without diacritics (siin and
saad, for example, are distinguished in transliteration by diacritic
only, but their appearance in Arabic is quite separate). I believe the
correct transliteration is "as-safha ar-ra'isih", although that might
be wrong (it has been a few years since I took Arabic classes).



Mark

On 09/04/2008, Gerard Meijssen <gerard.meijssen at gmail.com> wrote:
> Hoi,
>  If it does not even support Japanese or Korean, then it is extremely
>  unlikely to support Armenian, Georgian, Amharic, Kannada, Bengali ... If it
>  does not what is the point ???
>  Thanks,
>       GerardM
>
>  On Wed, Apr 9, 2008 at 2:42 PM, Tim Starling <tstarling at wikimedia.org>
>  wrote:
>
>
>  > Mark Williamson wrote:
>  > >>  Is anyone willing to help find or compile transcription tables for a
>  > wide
>  > >>  variety of languages?
>  > >
>  > > See ICU's transform function. http://www.icu-project.org/
>  > >
>  > > I can help compile transcription tables for ones not provided there.
>  >
>  > ICU is probably the best of those available. But it's not perfect.
>  >
>  > * It aims for reversibility, adding odd punctuation to distinguish between
>  > similar source strings. We would prefer readable if somewhat ambiguous
>  > output.
>  > * It doesn't have any support for Japanese or Korean readings of the
>  > unified han, as Mark noted.
>  > * We would have to write a command line wrapper program for that part of
>  > the library.
>  >
>  > -- Tim Starling
>  >
>  >
>  > _______________________________________________
>  > foundation-l mailing list
>  > foundation-l at lists.wikimedia.org
>  > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>  >
>  _______________________________________________
>  foundation-l mailing list
>  foundation-l at lists.wikimedia.org
>  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list