Tim, please see yue: and proposed (and "approved") wuu:
The "words" that don't exist in Mandarin aren't just words -- there
are grammatical differences as well.
And the differing words run very deep -- the Mandarin word for
"where", for example, uses different characters than the contemporary
Cantonese word.
This isn't an issue for zh.wp because it's written only in Baihuawen
(Mandarin-based written standard).
Before the advent of Baihuawen (it was, I believe, invented by the
Nationalists), the written lingua franca of China was Classical
Chinese, quite different grammatically and vocabulary-wise from
Mandarin. Perhaps comparable to Katharevousa and Dhimotiki in Greece.
People might question the use of vernacular Wikipedias, but then, one
could in a similar vein question the use of the Catalan Wikipedia,
since basically all people literate in Catalan are also literate in
either French or Spanish (or both) as well.
One person gave the example of a child -- a child in Hong Kong could
understand articles in the Cantonese Wikipedia if read aloud to them,
but articles read from zh.wp would be confusing and in many cases
incomprehensible (they won't learn Baihuawen until primary school).
There seems to be a common misconception that thruought China,
Baihuawen is read aloud according to the character pronunciations of
the native regional dialect of the reader.
This isn't true, though -- people reading Baihuawen aloud will almost
invariably read it in Mandarin or, in Hong Kong and Macau, Cantonese.
Other varieties are largely oral media. That isn't to say that nobody
writes in Xiang or Wu or Northern Min, or that nobody is literate in
it. Anyone who speaks Xiang and has a rudimentary knowledge of Chinese
characters will be able to read written vernacular Xiang, and there
are certain occasions on which it is written.
And although colloquial writing can be found in every major regional
variety of Chinese, as a popular phenomenon it is limited mostly to
Cantonese, Wu, Minnan, and Hakka for a few reasons: Cantonese and Wu
are both the languages of huge (HUGE) urban centers where people take
pride in their local identity; Minnan and Hakka are used on Taiwan,
which doesn't currently have a rabid government movement to eradicate
local varieties so people who want to develop written forms for the
native languages have had quite a bit of success.
Having said that, most people who write Minnan do not write it in
Peh-oe-ji. There is no single agreed-upon orthography, but most people
use a mixture of Chinese characters and Roman letters (and
occasionally Japanese characters as well). This is not currently
practical for writing an encyclopaedia though, because a huge portion
of the words in the language have no consensus as to what character
should be used to write them.
Mark
On 09/06/06, Tim Starling <t.starling(a)physics.unimelb.edu.au> wrote:
Gerard Meijssen wrote:
These languagecodes can then have a script
associated with them eg
cmn-Hans for simplified Chinese. When there are dialects within a
language they can be identified as well.
Using zh is imho utterly confusing and associating them with countries
does not help at all. More than one language is spoken in Taiwan and
therefore zh-hans-tw does not cut it. When you replace traditional with
simplified, within one language I can understand what you are doing.
However when you in essence start moving across languages and is that
not the implication when you talk about things having different
terminology are you then not trying to provide translations ?
The written language is much more closely standardised than the spoken
language. Speakers of mutually incomprehensible variants of Chinese
write in a common script based on Mandarin. Differences in terminology
do occur, and in some cases, such as "Cantonese colloquial", characters
are introduced to describe words in the local dialect that do not exist
in Mandarin. However, these differences are much easier to resolve in
software than full translation between languages written phonetically.
For conversion to zh-hk, MediaWiki uses a dictionary of 211 entries in
addition to the simplified/traditional tables, and for conversion to
zh-sg, there is only 15 extra entries.
Is it translation? Or is it just an spelling and terminology change,
like conversion from US English to Commonwealth English? Maybe somewhere
in between.
-- Tim Starling
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
--
Refije dirije lanmè yo paske nou posede pwòp bato.