[Wikipedia-l] Re: One Chinese Wikipedia

yuanml yuanml at pku.org.cn
Fri Sep 10 16:34:40 UTC 2004


Henry H. Tan-Tenn wrote:
> No technical solution will, of course, address the differences in 
> vocabulary among Taiwan, PRC-Hong Kong, PRC-Mainland (and God forbid, 
> Singapore and PRC-Macau).  The differences are sometimes considerable in 
> certain technical and pop cultural fields, though they should not be 
> exaggerated.

In fact, some member of the Chinese Wikipedia community, is just working hard to solve 
this problem. We discussed this problem since the beginning of the community. 
I don't think it is a difficult problem, We can introduce new markup to solve it easily.
Some programmer had setup a Wiki to test this idea, and it was worked, 
please visit http://fengzz.net/wiki/ to try.

I think it is really important not to split the Chinese Wikipedia community,
we have the same language, and we needn't to write the same thing twice.

Automatic conversition between SC and TC is very important for Chinese Wikipedia.
we have been discussing this problem for a long time. From our discussing, you can see that:

* CJK Unified Ideographs and CJK Compatibility Ideographs in Unicode chart contains
about thirty thousand Chinese characters, both simplified and tradational Chinese characters.
* simplified Chinese are only 2235 frequent characters.
* most of the 2235 simplified characters can be mapped to the tradational version
in a one-to-one way.
* only about 100 simp-trad pairs is not one-to-one.

In fact completely automatic conversition between SC and TC is really difficult,
but we can convert the difficult part manually, we can introduce some markup.

I really want to solve this problem, and I think I have the ability,
but just now i only have no time. The other difficulty for me is that
I don't have a thorough understand of the Squid cache. 

I think, to solve this problem, we must design a solution are conformable with the
url, database and Squid cache requirement. Chinese Wikipedia community had discussed these problems
for a long time, The real problem so far is that no programmer take the real step.

I plan th solve it in the next year when I finnished my graduate thesis, if no one take the real step. 
It is very welcome that someone is interesting about the problem and work hard to solve it. 









More information about the Wikipedia-l mailing list