Re: [Wikipedia-l] Re: One Chinese Wikipedia

15 Sep 2004

OK, I have a proposal here which I think may solve the problem (I'm
not 100% sure about software implementation though).

It is a mixed solution, but I think it shares the good parts of both solutions.

There would be two separate subdomains, http://zh-tw.wikipedia.org/
and http://zh-cn.wikipedia.org/. If you visit the first, you will see
the UI in Traditional, and if you visit the latter the UI will be in
Simplified.

Both of these can share the same database, with automatic conversion occuring.

I propose to store all text in Traditional but convert it to
Simplified (perhaps with some sort of caching so articles do not have
to be re-generated each time) because TC>SC conversion is less
ambiguous than SC>TC conversion. If somebody adds text to an article
but they are typing in SC, it will be converted to TC when it adds it
to the database. In the edit window even though, text will appear as
whichever domain you are at. Titles of articles should be converted
too. If a mistake is made in conversion when a Simplified text is
added to the database, eventually somebody browsing at
http://zh-tw.wikipedia.org/ will notice this error and hopefully fix
it. In the mean time this error won't cause any problems on zh-cn
because it will convert back the same way.

Would it be extremely difficult to have two separate Wikipedias use
the same database but use conversion before displaying on the
client-side and before new text is added to the database?

Of course, there would be a link on the sidebar that always gave you
the option to switch to the other variety, except it would be
displayed separately from normal Interwiki links.

--金俊書/Mark

On Tue, 14 Sep 2004 17:09:43 -0700, Mark Williamson &lt;node.ue(a)gmail.com&gt; wrote:
...
  The "dialect" question is a very difficult
one to answer and the
 creation of zh-min-nan: has already made ripples in the zh: community.
 The difference between Minnan and other "dialects" though is that, as
 far as I'm aware, none of the other Chinese dialects/Sinitic languages
 has a large movement to switch to a different writing system.

 First and foremost this problem could be looked at in terms of Cantonese.

 Modern Cantonese actually has two different versions, one that is just
 reading text written for Mandarin speakers but with Cantonese
 readings, the other being using Cantonese grammar and vocabulary words
 that Cantonese has but Mandarin doesn't.

 Until very recently the latter had the higher status in Hong Kong and
 Macau, however upon reunification the former gained the higher status.

 Most Cantonese speakers, even if they don't know Mandarin, can read
 texts written by a Mandarin speaker with little difficulty, but much
 of the sentences are not how they would say them in everyday speech.

 Then there is also an issue with Classical Chinese which is very
 different from modern Mandarin. Until very recently any sort of
 reference work like an encyclopedia would've been written in Classical
 Chinese which was the literary language.

 There may be some movement to start a Classical Chinese Wikipedia but
 if there is it must be very small.

 However Classical Chinese sentences often seem more natural in
 Cantonese or Hakka or other Southern dialects than do the equivalents
 in written Mandarin.

 Also if you were to convert zh-min-nan: into Chinese characters it
 would become apparent very quickly that it wasn't Mandarin, especially
 because Mandarin uses such words as 的 (de) which many people say is
 "bastardized classical Chinese" because originally 的 was created
 exclusively for writing Mandarin, the character properly used for
 Taiwanese and Classical Chinese is 之 (as you can see 之 is a basic
 character, but 的 has two different parts).

 --金俊書/Mark

 On Tue, 14 Sep 2004 20:43:29 +0100, Rowan Collins

 &lt;rowan.collins(a)gmail.com&gt; wrote:
   Just a
question out of curiosity about how you handle this: what's the
 base language, or is there one?  Is the primary version of a document in
 Simplified, and then there are annotations for how to correctly
 translate it to Traditional (i.e. [simplified character|proper
 traditional character]), or is it the other way around, or are both
 Simplified and Traditional equally base languages? 
 Glancing at the current test implementation, I gather that neither has
 'precedence': to put a manual translation, you say {-zh-cn one version
 zh-tw the other-}. Wether the mappings are one-to-many in one
 direction, the other, or both, is not a problem: you simply define
 what you want displayed, in that particular case, in both versions.

 What I'm not clear on, having not looked in any depth, is how the
 article is actually *stored*. The explanation seems to imply that the
 characters are simply recognised as being either:
 a) in the desired writing system; no action needed
 b) in the non-desired writing system; automatic translation required
 or c) marked up as a special case; version chosen to match preference
 as per special syntax

 I may be wrong, but if I'm right this obviously no use to the more
 general case of languages/dialects. For that, you'd probably need to
 store which language the 'original' was in in the database, and then
 convert based on that. Although then you'd have the problem of changes
 that weren't easily translatable back to that base, wouldn't you? i.e.
 base is LangA, a LangB user makes a change, but that change is
 ambiguous in LangA; how is that change recorded? Similarly, if a naive
 LangB user "corrects" the automated translation, they may end up
 creating an error in the LangA document, because they overwrote the
 original rather than adding special syntax. Ouch. It's more
 complicated than I expected, unless that's just cos I'm hungry... ;)

 --
 Rowan Collins BSc
 [IMSoP]

 _______________________________________________
 Wikipedia-l mailing list
 Wikipedia-l(a)Wikimedia.org
 http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Re: One Chinese Wikipedia