[Foundation-l] On Arabic and sub-language proposals.

Milos Rancic millosh at gmail.com
Tue Oct 7 12:12:05 UTC 2008


There are two separate issues in relation to standard language
creation: ethnic/political-based and language-based. Inside of the
first group are South Slavic standards based on Neo-Shtokavian dialect
(Serbian, Croatian, Bosnian, Montenegrin), two Norwegian standard
languages, two Belorussian standards, Romanian-Moldovan case and,
possibly, some number of other cases. Inside of the second are the
most of the cases all over the world: from around 800 not standardized
languages of Papua New Guinea to languages of Europe which have never
a possibility to get the status of "a language". Inside of the second
group are Arabian languages.

Our responsibility for the languages of the first group is to find the
best solution for the new project. Sometimes it is possible to make a
conversion engine between standards, sometimes it is not, but it is
reasonable not to create a new project, but to leave both (close)
standards to be written at one. But, if both options are not possible,
our responsibility is to give them a separate project.

What our responsibility in such cases is not -- is to help to those
people in forming of the new standard. Wikipedia and other Wikimedia
projects are about education, not about nation formation. While it is
possible to see some exceptions (including already existing projects),
I would be very strict here: (1) no ISO code -- no Wikipedia; (2)
conversion engine is possible -- new languages will be at the existing
projects; etc.

However, I don't think that it is not our responsibility in relation
to the second group of languages. In comparison with all relevant
international institutions which deal with languages -- Wikimedian
community is the most relevant. There are a few organizations which
are willing to help in standardization of some language, but I don't
know for anyone which is willing to do that if it is not about
translation of the Bible (if anyone knows for such organization
without such agenda, please let me know!). Wikimedian community grew
up enough to deal with such things.

So, even if it needs extra efforts, I think that we should do it.
Because, if we are not doing that, no one would would do.

On Tue, Oct 7, 2008 at 9:42 AM, Muhammad Alsebaey <shipmaster at gmail.com> wrote:
> Hi Milos,
>
> Thank you for the fascinating insight, linguists are like the
> anthropologists of culture :) .
>
> Anyway, my opinion is simple, we may or may not be undergoing a process
> where our language is morphing and forming, and we may even in a decade (or
> less) see our version of Arabic as written (I do have reservations, Standard
> or formal Arabic is not dead, as it is the language of the religious text in
> a heavily religious part of the world, among other factors). However, let me
> stress again my point, is it the WMF place to take a stand as to accelerate
> such an adoption of the spoken language as written? I dont think so. There
> has never been a published text in Masry in history, politics, science or
> any non-fiction topic AFAIK. The Masry Wikipedia will be the first to have
> such text, so will probably be the Lebanese, Sudanese, and Morrocan (and
> Gerard, you were saying worry over Lebanese is an over-reaction, how about
> Morrocan, its approval is under-way as far as I see).  I stil strongly see
> that as 'original research' and a stand by the WMF to actually support the
> adoption of those language as written (as opposed to leaving that to be
> resolved by the respective community). So it may well be that those
> languages will become adopted as written at some time in the future, and it
> may well be that the partially formed standard for Masry that you speak of
> will come to light and somehow get adopted by the respective population, but
> until then, I think the WMF should stand on the sideline IMHO.
>
> 2008/10/5 Milos Rancic <millosh at gmail.com>
>
>> On Sun, Oct 5, 2008 at 10:04 AM, Muhammad Alsebaey <shipmaster at gmail.com>
>> wrote:
>> > Hi everyone,
>> >
>> > The following is my belated, rather long, 2 cents regarding the creation
>> of
>> > wikipedias for languages/dialects/whatever-you-want-to-call-them that
>> stem
>> > from Arabic, this is mainly relevant to the creation of the Masry
>> (Egyptian)
>> > Wikipedia, the Masry Wikitionary proposals (by virtue of the fact that I
>> am
>> > Egyptian, and thus I can relate to those two projects with a better
>> degree
>> > of confidence), but probably is still relevant for the proposals that
>> > subsequently stemmed for Morrocan, Lebanese, Sudanese and more will come
>> I
>> > am sure.
>>
>> Thanks for you email, it is a great one! Its content may be used as an
>> example on universities: what do one educated non-linguist thinks
>> about the situation when new standard languages are in the process of
>> creation. I'll write a short paper/essay around your email. (Not here,
>> even my email is long :) )
>>
>> I see the situation in relation between classic Arabic and regional
>> languages very similar to the situation when Romance standard
>> languages were born. Few steps behind that is the situation with
>> English languages (yes, plural); however, morphological orthography
>> very close to the logogramic type (like Chinese; but, instead of
>> lines, letters are used) prevents up to some extent orthographic
>> diversification. But, such situation can't last for a long time.
>> Actually, Scots is already treated as a separate language.
>>
>> First, I may suppose that, for example, even Libyan and Egyptian
>> spoken Arabic are not mutually understandable. But, if one Libyan may
>> understand one Egyptian, it may be be comparable with the situation
>> where one Portuguese may understand one Spanish up to some level.
>>
>> I would say that the processes which are ongoing in Arab countries --
>> are natural. Learning a foreign language to be basically educated is
>> not an advantage. It is an advantage at some higher level, but such
>> situation leaves many people without the basic education (because they
>> are not able or not willing to learn a foreign language). It is much
>> easier to learn to write a native language.
>>
>> Linguistic standardization is very strongly connected with politics.
>> Mostly, it is connected because contemporary linguistics is a 19th
>> century invent from Europe; and this was a time of romanticism, when
>> the ideology based on premises "one language, one folk, one state[,
>> one leader]" was dominant.
>>
>> While it is possible to find different examples (Irish nation which
>> uses English; Swiss nation which uses four languages), it is true that
>> wherever European civilization came -- states are trying to make their
>> own ethnicity and their own language.
>>
>> At the other side, at the time when language standardization was not
>> forced, "natural" processes of language separation were dominant.
>> Separated by natural barriers or feudal states barriers, people
>> developed separate languages.
>>
>> In Europe, especially in Germany and Italy, where small feudal
>> countries existed for a long time, a lot of separate language
>> varieties exist at the areas of former feuds. For example, I think
>> that areas Nuremberg and Hamburg have distinctively separate varieties
>> than areas around those cities, without dialect continuum [1].
>>
>> So, there are two separate social (and just because this, linguistic,
>> too) processes: when not well connected, wider areas with one culture
>> (like the case was with Roman and it is with Arabic), it tends to
>> separate to different societies, states, cultures and languages. If a
>> lot of different societies and cultures exist on smaller and well
>> connected area, they tend to be merged. Of course, opposite historical
>> examples may be found: Andorra, Lichtenstein, Monaco, San Marino etc.
>> are still separate states, while China is still one.
>>
>> > Let me state first though, that even though it will be obvious from my
>> > concerns below that I am against such a division (slightly oppose, to be
>> > precise), I have no opinion as to whether those languages or dialects (as
>> > proponents and opponents would call them) are really separate languages
>> or
>> > not. I have some issues and worries, which is what I will expand on
>> below,
>> > but ultimately, I don't know if what I speak is actually classified as a
>> > separate language or a dialect (yeah I am that ignorant :P ) so from the
>> > specific rules-based linguistic-jargon point of view, I am sadly out of
>> my
>> > league.
>>
>> It is hard to give a clear linguistic answer what one language is;
>> even if we remove all political reasons. There are some obvious cases,
>> like distinction between Arabic and English is. However, there are a
>> lot of cases when it is not possible to give a clear answer.
>>
>> A classic example for comparison of this kind is that spoken languages
>> in Germany are (or, at least, they were in 19th century) more
>> different than all Slavic languages between themselves. But, if we
>> remove political reasons (one German state; a number of Slavic states)
>> and try to give "a linguistic answer" what are the languages, we
>> couldn't do that.
>>
>> Simply, the question "is this a separate language?" is a question of
>> the type "is the color [in RGB notation] #00xxxx blue or green?". We
>> are sure that #00FF00 is green and that #0000FF is blue and that they
>> are separate colors. We may be sure that even #00FF22 is green, while
>> #0022FF is blue. However, we can't be so sure when we move numbers
>> closer. Giving a discrete answer to a question which is a product of
>> our [whichever] bias is sometimes impossible.
>>
>> > I have read most of the (rather heated) arguments for and against the
>> > proposals, here is what I understand (from a layman point of view) about
>> my
>> > language: I speak Egyptian, which is a form of Arabic, it is not the same
>> as
>> > 'formal' Arabic, however, it is only spoken in most of the cases. I think
>> > the majority of the body of literature written by Egyptians is written in
>> > formal Arabic. I simply come to this conclusion because as an avid reader
>> I
>> > must have come across only one or two literary pieces written in Egyptian
>> > Arabic as  'pioneering experimental' works (as one author called his
>> stuff).
>> > Also the way of writing is not agreed upon by egyptians themselves, for
>> > example: words that contains the letter Kaaf (ق),  I saw some of the
>> authors
>> > who tried writing a word containing it in 'Masry' would keep it as is and
>> > other people would convert it to 'Hamza' as it is actually pronounced but
>> is
>> > rather foreign to read. I can safely assume that almost all literate
>> > Egyptians who read and write in formal Arabic (actually that *is* the
>> > definition of being literate in Egypt) will find reading their own every
>> day
>> > talking language rather alien (kind of ridiculous, but is the case IMHO).
>> > The point I am trying to make here is : For a language/dialect that has
>> only
>> > been spoken till now for the most part, Wikipedia turning it into a
>> written
>> > language would be 'original research' and this is what I actually
>> observed
>> > in Wikipedia Masry, people write as they please, and the result is
>> sometimes
>> > palatable and some times very foreign and alienating (as a method of
>> > delivering information). I suspect the same would be the case for at
>> least
>> > the Lebanese and Sudanese proposals for example, ditto if there will ever
>> be
>> > a proposal for the gulf dialects (Saudi, Yemeni, etc.), the
>> Egyptsystemian Sai'di
>> > (upper Egypt dialect), etc...
>>
>> My father is from the area of Serbia where a distinctive language is
>> spoken, Torlak or Shop [2]. Unlike in the case of other geographical
>> varieties in the South Slavic area, Torlak is not moribund, it is
>> really alive language and speakers of it are actively adopting Serbian
>> and Bulgarian words at the substratum of highly Balkanized (see Balkan
>> sprachbund [3]; it's a separate, actually, opposite term from the
>> political Balkanizaiton) mixture of Vulgar Latin [4], Thracian and
>> dominantly Slavic languages (of course, Serbian, Bulgarian and
>> Macedonian are Slavic languages, but, from the present situation,
>> substratum is not based on Serbian, Bulgarian and Macedonian
>> standards). It has no written literature (there are some "examples",
>> but they are examples for usage of that language for dialogs inside of
>> dramas written in standard Serbian); the situation is analogue as in
>> Egypt. A literate inhabitant of Southern and Eastern Serbia has to
>> know Serbian standard, a literate inhabitant of Western Bulgaria has
>> to know Bulgarian standard; while a literate inhabitant of Northern
>> Macedonia has to know Macedonian standard.
>>
>> When I was talking with one of the rare people who works on language
>> there (a local one), we came to the question why inhabitants (even
>> very educated; even professors of Serbian language) are using a
>> dialect in all kinds of their communications in school except the most
>> formal ones (lectures to high school students). The answer was:
>> "Because it is easier to us, we don't need to care about rules."
>>
>> This is interesting because of two reasons. First, they care about
>> rules, even they don't think so. It is the basic characteristic of all
>> communication systems: participants have to follow some rules to be
>> able to send an information and understand each other. The second
>> issue shows how hard is one language system to speakers of a different
>> one.
>>
>> But, the main difference between the situation in Egypt and in
>> Southern and Eastern Serbia is the number of inhabitants. There is
>> something between 200.000 and 500.000 people who are speaking Torlak
>> (comparing with 76+ millions of Egyptians) inside of three very strong
>> educational systems (95%+ comparing with 70%+ in Egypt). Speakers of
>> Torlak are surrounded by speakers of standard Serbian, Bulgarian and
>> Macedonian; while, AFAIK, there is no such place where standard Arabic
>> is a common spoken language.
>>
>> In other words, Masri came into the position when it is not in the
>> position of "a dialect of a language". It is now a spoken language
>> with all cultural attributes of one language except the normalized
>> standard (AFAIK, some kind of standard exists, but it is not finished
>> yet).
>>
>> The situation where people are able to choose how do they want to
>> write is not a stable one. Sooner or later some [more precise]
>> standard will start to be followed.
>>
>> > My second concern is, I am worried about duplicating the efforts in the
>> name
>> > of language separation, granted, I speak something that is not similar to
>> > formal Arabic etymology-wise maybe. However, there is not one literate
>> > Arabic-speaking person who can claim he understands written
>> > Egyptian/Lebanese/etc. and not understand formal Arabic (by virtue of the
>> > the above argument that my language is mostly spoken, and what is taught
>> in
>> > schools, and used in everyday written communication is formal Arabic). I
>> > dont know if it is good, given the already low participation level in my
>> > area of the world, to let people have Egyptian/Lebanese/Saudi/Yemeni
>> > mini-wiki projects, keeping in mind that all users of those will be
>> > perfectly comfortable reading the information in the Arabic corresponding
>> > project.
>>
>> How distant are standard Arabic and Masri? Is it possible to make a
>> conversion engine between those two languages? If you don't think so,
>> what are the reasons?
>>
>> I believe (I say that I believe because I didn't prove it :) ) that it
>> is possible to make very good conversion engines between similar
>> languages (conversion engine between Bokmal and Nynorsk exists, but I
>> don't know how good it is). And it is worth of effort. In the case of
>> "Arabance" languages and Arabic such efforts may be very well funded.
>>
>> If it is not possible, note that Arabic language has the base in more
>> than 1 billion of people (including all other Muslim countries); as
>> well as Masri has the base in 76+ millions of people. Masri has better
>> position than, let's say, Italian. So, the right way for thinking
>> about this issue is to concentrate on efforts for spreading education
>> and Internet in Egypt and other Arab countries.
>>
>> > Finally, I think the division is not purely language related, there is a
>> lot
>> > of socio-political issues at work, taking the Egyptian wikipedia again as
>> an
>> > example, there has been a considerable debate in Egypt about getting the
>> > Egyptian language to be adopted writing-wise (and to make the grammar
>> more
>> > solid so as it would overcome the current problems in writing) to bolster
>> > the national identity of Egypt, while this proposal is currently going
>> > nowhere, it wont be hard to imagine groups interested in promoting this
>> > canvassing just to prove their point, do we want to get involved in such
>> an
>> > argument? is it wikipedia's place to? isnt such a statement already made
>> by
>> > Wikimedia creating one of the first bodies of written text in the
>> language?
>>
>> :) As I explained before, every language (in the common sense of the
>> meaning of the word "language") is a matter of politics, not
>> linguistics. Even when you don't realize that as an obvious fact.
>> Arabic is a matter of politics, English is a matter of politics,
>> German is a matter of politics, French is a matter of politics,
>> Russian, Italian, Serbian, Croatian, Japanese, Yoruba, Zulu, Mayan...
>>
>> Linguists are a small minority of inhabitants of some country. They
>> are not politically relevant to demand new language for new nation.
>> Also, they are not politically relevant to demand preservation of old
>> language. If one linguist says one of those things, he is not lead by
>> linguistics, but by political motives (no matter how positive or
>> negative those motives may be).  While language standardization is a
>> matter of sociolinguistics, again, it is more about description than
>> about active involvement in political processes.
>>
>> > I understand that it may be too late for Egyptian Wikipedia, the decision
>> is
>> > apparently already in, but I am currently seeing a slew of similar
>> > proposals,so I thought there should be some kind of discussion regarding
>> the
>> > broader topic and not restricted to the proposal pages. I hope I haven't
>> > spammed this list with this email  :).
>>
>> On our eyes Arabic language is developing into "Arabence" languages,
>> like Latin did it between the first centuries of the first millennium
>> and 19th century; and Slavic during the first centuries of the second
>> millennium. The conditions are now very different. There are Internet,
>> railroads, highways... You have a lot of possibilities to keep good
>> things from the fact that the most of educated people from Muslim
>> world know standard Arabic fluently and you should build your new
>> local languages to make education more achievable to more people.
>>
>> And, to say again, your email is a great one. You described very well
>> the situation in which your society is now because of the birth of new
>> language.
>>
>> [1] - http://en.wikipedia.org/wiki/Dialect_continuum
>> [2] - http://en.wikipedia.org/wiki/Torlak_dialect
>> [3] - http://en.wikipedia.org/wiki/Balkan_sprachbund
>> [4] - http://en.wikipedia.org/wiki/Vulgar_Latin
>> _______________________________________________
>> foundation-l mailing list
>> foundation-l at lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>>
>
>
>
> --
> Best Regards,
> Muhammad Alsebaey
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list