[Wikitech-l] Re: zh-min-nan for Holopedia at Wikipedia, please! [no forking]

Henry H. Tan-Tenn share2002nov at lomaji.com
Wed Jul 7 17:45:39 UTC 2004


Tim Starling wrote:
> The reason for choosing zh-cfr can be summarised in one word: Aromanian. 
> A wiki in the Aromanian dialect was requested, and it was agreed that we 
> should have one, but there was a problem with choosing the subdomain. 
> Aromanian does not have a distinct ISO 639 code, however it is listed in 
> the Ethlnologue under the code RUP. We had used only ISO 639 codes up to 
> that time.

I understand the line of thought re Aromanian.  My impression, though, 
was that "tokipona" preceded the roa-rup construction in not following 
ISO 639.  Perhaps there was/is a separate policy for conlangs, I don't 
know -- or maybe there was a transitional (ongoing?) state of confusion.

> The solution we came up with was to use the group ISO 639 code followed 
> by the SIL code. This allows us to specify most languages in the 
> Ethnologue without conflicting with the ISO standard, since ISO 639 
> codes do not contain hyphens.

My understanding is that RFC 3066 is backward compatible with ISO-639. 
It much prefers ISO-639 whenever a tag exists.  I quote as an example:

"4. When a language has both an IANA-registered tag (i-something) and
       a tag derived from an ISO registered code, you MUST use the ISO
       tag.  NOTE: When such a situation is discovered, the IANA-
       registered tag SHOULD be deprecated as soon as possible."

Some related discussion on meta:
http://meta.wikipedia.org/wiki/Language_codes#RFC_3066

Now of course Wikipedia *could* stick to ISO-639 but it's so...um, 
limiting, as we've discovered.

> Using the hyphenated language tags assigned by IANA, such as zh-min-nan, 
> would conflict with this scheme. For example if we used zh-yue, it would 
> be difficult to know what "yue" refers to. Is it an SIL code or an 
> assigned code?

Well, zh-yue has been defined 
(http://www.iana.org/assignments/lang-tags/zh-yue) so I'm not sure why 
it'd be confusing to find out.  Even without a lookup, on the face of it 
it's telling:  the zh (which is ISO-639) says it's some kind of Sinitic 
variant.  The "yue" part is a well-known word for Cantonese; it's found 
in words such as "粵語" (Yue Speech/Language), "粵劇" (Yue Opera). 
Although "yue" alone is not part of any standard I know of, the meaning 
is clear.

To further argue from examples:  "zh-min-nan" is more intuitive than 
"zh-cfr".  "min-nan" is again the transliteration of a common word in 
Chinese.  Although one *could* look up "cfr", the former can be deduced 
based on knowledge common to Internet-using speakers of the said language.

To sum up:  I think the IANA-assigned codes are (1) ISO-639 compatible 
and (2) apparently easier to decipher than SIL tags (though there may be 
counter examples).  Unlike "tokipona", "minnan" alone, IANA tags appear 
to make use of ISO-639 to suggest genetic affiliation.  That's useful, 
in my opinion.

As for the use of common names, that's something to consider.  I'd 
prefer the Minnan Wikipedia not be used as a guinea pig, though :)

> If I understand correctly, Shizhao's problem is that holopedia.net, and 
> by extension minnan.wikipedia.org, is written in a script peculiar to 
> Taiwan. The writing there is thus not representative of min-nan 
> generally. So wouldn't it be better to use the RFC 3066 code specific to 
> Taiwanese, namely zh-min-nan-TW? Or indeed, in keeping with my earlier 
> point about such language codes being cryptic and unnecessarily lengthy, 
> why not use ho-lo-oe.wikipedia.org?

I have not seen Shizhao describe the issue, certainly not to Minnan 
Wikipedians.  Now, I'm personally not opposed to "zh-min-nan-TW" (the 
matter has not been discussed in Minnan), though as you know, Wikipedia 
generally prefers to put same-language content together.  The most 
prominent example of this is the zh Wikipedia (where Simplifed and 
Traditional scripts coexist).  Various other languages also have 
Latin/Cyrillic/Arabic di-/trigraphia.  (And I think the issue has not 
been more prominent because those Wikipedias are fairly inactive or 
merely reserved.)

To keep discussion short I'll just say, for now, that most variants of 
Minnan are generally and historically *not* written and when written, 
generally without reference to a standard (lots of words written 
inconsistently).  The genra have also been limited to songs and other 
poetic forms.  So it's difficult to talk about representativeness in 
that context.  The current Minnan Wikipedia *is* the "state of the art" 
to the extent that Minnan has been written and read historically.

~~~~








More information about the Wikitech-l mailing list