[Wikipedia-l] Re: One Chinese Wikipedia

Mark Williamson node.ue at gmail.com
Wed Sep 15 03:31:11 UTC 2004


OK, I have a proposal here which I think may solve the problem (I'm
not 100% sure about software implementation though).

It is a mixed solution, but I think it shares the good parts of both solutions.

There would be two separate subdomains, http://zh-tw.wikipedia.org/
and http://zh-cn.wikipedia.org/. If you visit the first, you will see
the UI in Traditional, and if you visit the latter the UI will be in
Simplified.

Both of these can share the same database, with automatic conversion occuring.

I propose to store all text in Traditional but convert it to
Simplified (perhaps with some sort of caching so articles do not have
to be re-generated each time) because TC>SC conversion is less
ambiguous than SC>TC conversion. If somebody adds text to an article
but they are typing in SC, it will be converted to TC when it adds it
to the database. In the edit window even though, text will appear as
whichever domain you are at. Titles of articles should be converted
too. If a mistake is made in conversion when a Simplified text is
added to the database, eventually somebody browsing at
http://zh-tw.wikipedia.org/ will notice this error and hopefully fix
it. In the mean time this error won't cause any problems on zh-cn
because it will convert back the same way.

Would it be extremely difficult to have two separate Wikipedias use
the same database but use conversion before displaying on the
client-side and before new text is added to the database?

Of course, there would be a link on the sidebar that always gave you
the option to switch to the other variety, except it would be
displayed separately from normal Interwiki links.

--金俊書/Mark

On Tue, 14 Sep 2004 17:09:43 -0700, Mark Williamson <node.ue at gmail.com> wrote:
> The "dialect" question is a very difficult one to answer and the
> creation of zh-min-nan: has already made ripples in the zh: community.
> The difference between Minnan and other "dialects" though is that, as
> far as I'm aware, none of the other Chinese dialects/Sinitic languages
> has a large movement to switch to a different writing system.
> 
> First and foremost this problem could be looked at in terms of Cantonese.
> 
> Modern Cantonese actually has two different versions, one that is just
> reading text written for Mandarin speakers but with Cantonese
> readings, the other being using Cantonese grammar and vocabulary words
> that Cantonese has but Mandarin doesn't.
> 
> Until very recently the latter had the higher status in Hong Kong and
> Macau, however upon reunification the former gained the higher status.
> 
> Most Cantonese speakers, even if they don't know Mandarin, can read
> texts written by a Mandarin speaker with little difficulty, but much
> of the sentences are not how they would say them in everyday speech.
> 
> Then there is also an issue with Classical Chinese which is very
> different from modern Mandarin. Until very recently any sort of
> reference work like an encyclopedia would've been written in Classical
> Chinese which was the literary language.
> 
> There may be some movement to start a Classical Chinese Wikipedia but
> if there is it must be very small.
> 
> However Classical Chinese sentences often seem more natural in
> Cantonese or Hakka or other Southern dialects than do the equivalents
> in written Mandarin.
> 
> Also if you were to convert zh-min-nan: into Chinese characters it
> would become apparent very quickly that it wasn't Mandarin, especially
> because Mandarin uses such words as 的 (de) which many people say is
> "bastardized classical Chinese" because originally 的 was created
> exclusively for writing Mandarin, the character properly used for
> Taiwanese and Classical Chinese is 之 (as you can see 之 is a basic
> character, but 的 has two different parts).
> 
> --金俊書/Mark
> 
> On Tue, 14 Sep 2004 20:43:29 +0100, Rowan Collins
> 
> 
> <rowan.collins at gmail.com> wrote:
> > > Just a question out of curiosity about how you handle this: what's the
> > > base language, or is there one?  Is the primary version of a document in
> > > Simplified, and then there are annotations for how to correctly
> > > translate it to Traditional (i.e. [simplified character|proper
> > > traditional character]), or is it the other way around, or are both
> > > Simplified and Traditional equally base languages?
> >
> > Glancing at the current test implementation, I gather that neither has
> > 'precedence': to put a manual translation, you say {-zh-cn one version
> > zh-tw the other-}. Wether the mappings are one-to-many in one
> > direction, the other, or both, is not a problem: you simply define
> > what you want displayed, in that particular case, in both versions.
> >
> > What I'm not clear on, having not looked in any depth, is how the
> > article is actually *stored*. The explanation seems to imply that the
> > characters are simply recognised as being either:
> > a) in the desired writing system; no action needed
> > b) in the non-desired writing system; automatic translation required
> > or c) marked up as a special case; version chosen to match preference
> > as per special syntax
> >
> > I may be wrong, but if I'm right this obviously no use to the more
> > general case of languages/dialects. For that, you'd probably need to
> > store which language the 'original' was in in the database, and then
> > convert based on that. Although then you'd have the problem of changes
> > that weren't easily translatable back to that base, wouldn't you? i.e.
> > base is LangA, a LangB user makes a change, but that change is
> > ambiguous in LangA; how is that change recorded? Similarly, if a naive
> > LangB user "corrects" the automated translation, they may end up
> > creating an error in the LangA document, because they overwrote the
> > original rather than adding special syntax. Ouch. It's more
> > complicated than I expected, unless that's just cos I'm hungry... ;)
> >
> > --
> > Rowan Collins BSc
> > [IMSoP]
> >
> >
> > _______________________________________________
> > Wikipedia-l mailing list
> > Wikipedia-l at Wikimedia.org
> > http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
> >
>



More information about the Wikipedia-l mailing list