[Wikipedia-l] Wider help needed for future software on sr, sh (and probably bs)

Mon Jun 27 16:23:00 UTC 2005

I saw that some people are interested in SC/S/C/B problematic and I
would ask all of them for some attention.

0. For people who don't know problematics, just to say that Serbian
language is written in two alphabets (Cyrillic and Latin) and it has
two standard variants (Ekavian and Iyekavian). Cyrillic and Latin are
not geographic specific (in Belgrade, Podgorica or Banja Luka you can
find both), but Ekavian and Iyekavian are. There are around 8 millions
of Ekavian speakers and around 2 millions of Iyekavian speakers.
Ekavian is used in Serbia (but officially, both standard variants are
equal in Serbia), Iyekavian is used in Republika Srpska (the part of
Bosnia and Herzegovina; officially, both standard variants are equal)
and Montenegro (only Iyekavian standard is official). It can be said
that all of that can be implemented for sh: and that bs: can implement
only Latin<->Cyrillic conversion.

Zhengzhu did a lot of work until now and we are waiting for the first
implementation of his software on sr:. The software is based on his
previous work on Chinese problem.

1. Zhengzhu would implement the basic part of software for sr: (which
would be used on sh:, too; and maybe on bs:). However, it is just the
beginning of the work and I think that all of that issue would need
some help from the people (both: contributors and developers) who are
interested in linguistics.

2. The first implementation of the software (on sr:) should be
implemented in month or two (as I know). Implementation assumes:

a) Keeping sr: policy that articles should be written in Cyrillic and
using Cyrillic-based syntax (in the sense of the starting alphabet).

b) Writing in Ekavian and/or specific syntax for marking
Ekavian-Iyekavian variants. Also, Ekavian-Iyekavian dictionary would
be used for automatic conversion and admins would have possibility to
update dictionary.

c) General conversion would work in both ways, but we don't want to
mix Latin, Cyrillic, Ekavian and Iyekavian (it is chaotic, silly for
average user, as well as it is not standard).

d) All changes are on the read level. There would not be any change on
the write level in MediaWiki.

3. It can be said that "classic" implementation of Zhengzhu's software
would be the next step and (as I think) it would be finished in the
next couple of months. Implementation assumes:

a) Possibility for writing in different alphabets and variants.

b) Conversion would be implemented on the write and read level.
Database would be written in Ekavian Cyrillic with markup; when
contributor writes something in Iyekavian or Ekavian Latin, it would
be converted into Ekavian Cyrillic.

4. The next step is Serbo-Croatian Wikipedia where more complex (but
more linguistically interesting) rules should be added.

I think that almost all people on the lists know that Serbian,
Croatian, Bosnian and Serbo-Croatian standards have minimal linguistic
differences. The most of differences are cultural and political. So,
we should be very careful with any decision related to that problem.
Actually, sr:, hr: and bs: should not be forced to become one
Wikipedia never.

But, we can work on sh: with a lot of care.

First of all, at sh: should be implemented extended Zhengzhu's
software; which would take care about different standard variants
(four Serbian, two Bosnian and one Croatian).

Less complex is implementation of S<->C<->B dictionaries. More complex
is starting to work on syntax (and maybe stylistic) differences. That
step assumes that we would need help from educated people in
linguistics.

Also, database should not stay in Ekavian Cyrillics (as exclusive
Serbian standard). We should make some kind of meta-alphabet and
meta-orthography for writing data into database.

And the last problem which I noticed are naming conventions. Would it
be in Latin? Would it be in Serbian variant? Would it be in Iyekavian?
Would it be...? This set of problems assumes that we need to make good
political solutions.

It is not good to make any kind of majorization. We can say that the
most of Serbs, Croats, Bosniaks and Montenegrins write in Latin
alphabet (around 50% of Serbs and Montenegrins, 90% of Bosniaks and
100% of Croats), but it would be very bad to implement sh: interwiki
links etc. in Latin alphabet because around 1/3 of speakers would
think that is is majorization. It can be said that maybe 60% of all
speakers are Ekavians, but all Croatians, Bosniaks and Montenegrins
are Iyekavians. Language policy in former Yugoslavia failed on
principle "Ekavian and Latin Serbo-Croatian language for all people in
Yugoslavia" (note that Slovenians and Macedonians have different
languages!). Only military partially implemented that principle.

5. When I am talking about linguistics and technical implementation, I
have clear solution. Any cultural/political problem which can be
solved in those ways -- can be easily done.

For example, we can call Serbo-Croatian in the sense of it's
linguistic base: Shtokavian; even two letter ISO code (sh) is correct
:) We have a lot of naming problems if we want to name the language
correct: correct name in English translation is "Serbo-Croatian,
Croato-Serbian, Croatian or Serbian, Serbian or Croatian" (because
Serbian construction was "Serbo-Croatian" and Croatian construction
was "Croatian or Serbian"). But, where are Bosniaks and Montenegrins
in that name?

I wanted to say that we can make little clever tricks for a number of
problems, but there is a big field of other cultural and political
problems. And if people here think that we are enough strong to work
on that problems, I would need a lot of help.

I think that the first step toward that solution is to make a
workgroup of Wikipedians who are interested to work on that problems.
The focus of that group should not be any (N)POV question nor the
question of the sense of existence of sr:, hr: and bs:; but only
making the solution which can allow possibility that people from sr:,
hr: and bs: can work together.