OK, I have a proposal here which I think may solve the problem (I'm
not 100% sure about software implementation though).
It is a mixed solution, but I think it shares the good parts of both solutions.
There would be two separate subdomains,
http://zh-tw.wikipedia.org/
and
http://zh-cn.wikipedia.org/. If you visit the first, you will see
the UI in Traditional, and if you visit the latter the UI will be in
Simplified.
Both of these can share the same database, with automatic conversion occuring.
I propose to store all text in Traditional but convert it to
Simplified (perhaps with some sort of caching so articles do not have
to be re-generated each time) because TC>SC conversion is less
ambiguous than SC>TC conversion. If somebody adds text to an article
but they are typing in SC, it will be converted to TC when it adds it
to the database. In the edit window even though, text will appear as
whichever domain you are at. Titles of articles should be converted
too. If a mistake is made in conversion when a Simplified text is
added to the database, eventually somebody browsing at
http://zh-tw.wikipedia.org/ will notice this error and hopefully fix
it. In the mean time this error won't cause any problems on zh-cn
because it will convert back the same way.
Would it be extremely difficult to have two separate Wikipedias use
the same database but use conversion before displaying on the
client-side and before new text is added to the database?
Of course, there would be a link on the sidebar that always gave you
the option to switch to the other variety, except it would be
displayed separately from normal Interwiki links.
--金俊書/Mark
On Tue, 14 Sep 2004 17:09:43 -0700, Mark Williamson <node.ue(a)gmail.com> wrote:
The "dialect" question is a very difficult
one to answer and the
creation of zh-min-nan: has already made ripples in the zh: community.
The difference between Minnan and other "dialects" though is that, as
far as I'm aware, none of the other Chinese dialects/Sinitic languages
has a large movement to switch to a different writing system.
First and foremost this problem could be looked at in terms of Cantonese.
Modern Cantonese actually has two different versions, one that is just
reading text written for Mandarin speakers but with Cantonese
readings, the other being using Cantonese grammar and vocabulary words
that Cantonese has but Mandarin doesn't.
Until very recently the latter had the higher status in Hong Kong and
Macau, however upon reunification the former gained the higher status.
Most Cantonese speakers, even if they don't know Mandarin, can read
texts written by a Mandarin speaker with little difficulty, but much
of the sentences are not how they would say them in everyday speech.
Then there is also an issue with Classical Chinese which is very
different from modern Mandarin. Until very recently any sort of
reference work like an encyclopedia would've been written in Classical
Chinese which was the literary language.
There may be some movement to start a Classical Chinese Wikipedia but
if there is it must be very small.
However Classical Chinese sentences often seem more natural in
Cantonese or Hakka or other Southern dialects than do the equivalents
in written Mandarin.
Also if you were to convert zh-min-nan: into Chinese characters it
would become apparent very quickly that it wasn't Mandarin, especially
because Mandarin uses such words as 的 (de) which many people say is
"bastardized classical Chinese" because originally 的 was created
exclusively for writing Mandarin, the character properly used for
Taiwanese and Classical Chinese is 之 (as you can see 之 is a basic
character, but 的 has two different parts).
--金俊書/Mark
On Tue, 14 Sep 2004 20:43:29 +0100, Rowan Collins
<rowan.collins(a)gmail.com> wrote:
Just a
question out of curiosity about how you handle this: what's the
base language, or is there one? Is the primary version of a document in
Simplified, and then there are annotations for how to correctly
translate it to Traditional (i.e. [simplified character|proper
traditional character]), or is it the other way around, or are both
Simplified and Traditional equally base languages?
Glancing at the current test implementation, I gather that neither has
'precedence': to put a manual translation, you say {-zh-cn one version
zh-tw the other-}. Wether the mappings are one-to-many in one
direction, the other, or both, is not a problem: you simply define
what you want displayed, in that particular case, in both versions.
What I'm not clear on, having not looked in any depth, is how the
article is actually *stored*. The explanation seems to imply that the
characters are simply recognised as being either:
a) in the desired writing system; no action needed
b) in the non-desired writing system; automatic translation required
or c) marked up as a special case; version chosen to match preference
as per special syntax
I may be wrong, but if I'm right this obviously no use to the more
general case of languages/dialects. For that, you'd probably need to
store which language the 'original' was in in the database, and then
convert based on that. Although then you'd have the problem of changes
that weren't easily translatable back to that base, wouldn't you? i.e.
base is LangA, a LangB user makes a change, but that change is
ambiguous in LangA; how is that change recorded? Similarly, if a naive
LangB user "corrects" the automated translation, they may end up
creating an error in the LangA document, because they overwrote the
original rather than adding special syntax. Ouch. It's more
complicated than I expected, unless that's just cos I'm hungry... ;)
--
Rowan Collins BSc
[IMSoP]
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikipedia-l