yuanml wrote:
To Mark Williamson:
By the way, since when am I trying to compare
en/jp and tc/sc? I was
merely responding to something somebody else said about SC and TC
users "living in the same universe" or something.
I don't think I lose my point.
tc/sc users enjoy the same concept structure of the universe,
but en/jp, en/tc or en/sc are not same.
For example planet Venus in English is a term related to a goddess,
but in both sc/tc planet Venus is related to the same things - gold
and star. In one word, tc/sc is the same language.
This is my point.
The tc/sc users not only enjoy the same grammar of language,
but also most part of their knowledge systems.
Let us not talk about Chinese native knowledge,
such as Chinese history, Foreklore, but let us talk about mordern science.
Terminologies of mordern science are introduced to China
since Ming Dynasty hundreds of years ago, and increased vastly after 1900.
The Chinese knowledge system evolve into their morder form
just after the New Culture Movment around 1920.
But the split of tc/sc is about at 1956,
then the tc/sc enjoy the same backgroud of their knowledge systems.
From 1949 to 1980s tc/sc evolved independently for
lack of communication,
then some new terminologies are different, such as in
computer science.
But after 1980s, the communication between tc/sc increased comparatively.
Disclaimer: I can't read Chinese, so I don't know whether this is
similar to any of the current or proposed solutions, but I have read
some of the literature on the subject. My apologies if I'm going over
old territory.
The best analogy is (I think) the difference between en-us and en-gb:
the differences are mostly "spelling" and idioms. Automatic conversion
is entirely possible, but occasionally imperfect. However, it should be
possible to paraphrase around these problems where they occur and
produce a single text that can be displayed (and edited) in either
language and converted to-and-fro.
Perhaps one way to do it would be as in this fictitious example: if I
have a (say) simplified word that means "fish", but can be transformed
to either (say) "FISH" or "STONE" in the traditional script. Suppose
we
auto-convert this '''into the Wiki source''' at edit time to
markup like
[fish=FISH|STONE]
which would display as "fish" highlighted in some way when the page is
rendered in simplified script to show there is a potential
transliteration problem, and as [FISH|STONE] when rendered in
traditional script.
Then it can be cleaned up in markup by writing:
[fish=FISH]
or similar markup, which will force the traditional rendering to the
correct word, and remove the warning flag for simplified rendering,
since there is now a one-to-one mapping. The same would apply for in
reverse for ambiguous conversions in the opposite direction. With any
luck, this could be entirely lexicon-driven, and would need no AI
research, because we would be find all pages containing ambiguities
automatically, and then harness the copyediting skills of Wikipedians to
find and disambiguate all the problematic text. We could even harness
this when idioms or short phrases differ, to go:
[idiom in simplified=IDIOM IN TRADITIONAL]
-- Neil