On Sep 10, 2004, at 5:15 PM, Mark Williamson wrote:
But the difference between the two isn't merely a
"difference of
character sets". Rather than converting on the level of the individual
character which will inevitably produce poor results, it is nessecary
to convert documents on the level of lexemes, for which one needs some
sort of artificial intelligence capable of separating Chinese texts
into individual lexemes before conversion.
Having done document conversion, the number of cases is managable here,
and belong in the realm of "things that can be searched for and tagged.
A cumbersome, but not hard, problem.
It is also nessecary to
convert names of countries, special terminology (including Wikipedia
terminology:
True, but again an enumerateable change, solveable in software, or by
the same process that we have now for proofing documents: namely people
go through and make intelligent changes. The process would be to flip
the toggle, scan the document for problems and then edit the underlying
wiki-material, inserting the metacodes needed. Much as people now scan
documents to find broken, redirected or ambiguated links, spelling
errors and so on.
the first two characters in the Simplified Chinese
name
for "wikipedia" would be translated alone into English as the name
"Vicky", which would be converted into Traditional in a specific way,
but the current way to write "wikipedia" in Traditional Chinese is not
like that), etc; also Simplified Chinese is more tolerant of the usage
of English words in the Roman alphabet than is Traditional (except
perhaps in Hong Kong where anglicisms are often even more frequent) as
is exemplified by various article texts.
That's a dialectical, not linguistic, issue.
Some people here are saying that "if I read this
text in simplified
aloud, a Taiwanese person can understand it". That is not the issue at
hand. If zh: were in Pinyin, perhaps, that would be the issue, or if
it was a spoken encyclopedia, maybe. But this is a written
encyclopedia. zh-cn: and zh-tw: may be largely the same spoken
language, but they are hardly the same written language.
--Jin Junshu/Mark
The general consensus of linguists is that you are overstating the
differences - that traditional and simplified represent the same
"written" language because the grammar is the same, most of the syntax
is the same. The visual difference is rather like the difference
between using the Latinate Greek characters, the one most people
associate with greek, and the older characters used in the classical
age. A person who can read one can't read the other, but translation
between the two is mainly a mechanical process that needs intervention
occassionally. While the traditional/simplified problem is a couple of
orders of magnitude more complicated, it isn't more complex in lexical
theory.
Which is not to minimize the differences - if the community consensus
is just "squash this!" then that is a mistake as larger as simply brute
force creating two versions. There are technical and methodological
hurdles that should be addressed, otherwise someone will reach the same
conclusion that Jin Junshu has - namely that a traditional Wiki is
needed, because there is a user community not well served by the
simplified version. Part of this is based on political forces that are
in operation out there: there is no desire among the Chinese reading
and writing community to break chinese into separate written languages
- that is to continue increasing differences until mutual
intelligibility is a difficult hurdle to pass. At the same time, there
is a desire among traditional users to continue to use traditional
characters, and there is a larger corpus of texts, many of them
fundamental texts, which exist as originals in traditional characters,
and which argue for wiki handling traditional characters in a
appropriate way.