Re: [Wikitech-l] Re: Case sensitivity on Afrikaans Wiktionary

21 Jul 2005

Mark Williamson wrote:

...
 Hi Gerard,

Signed languages are completely independent systems, and are separate
languages from spoken languages, different in grammar, syntax, and the
like. Unfortunately, there is no universally-agreed-upon method for
transcribing signed languages.

There are a few possibilities here.

1) Choose a particular transcription system. Top of the list would be
Stokoe, and Sutton Sign Writing; HamNoSys is also possible but is
mostly used by linguists while the other two are used more widely by
people who use it as their everyday language.

 It being the Ultimate Wiktionary, I would prefer not make a choise and 
have them all.

...
 2) Use multimedia. We can upload videos of people
signing a particular
word. Note that some signed languages also have conjugations and
inflections. However, this will leave a problem of headwords -- how do
you look up a word in a signed language? Which leads to the third
option,

 When you look at the ERD it is explicitly indicated by the 
Conju/Decli-Word table what the Headword is. This is done for any 
language and it means that it is by practice and not design that in most 
languages the infinitive will be the Headword. Consequently, when one 
word is found the other will be also shown in a format that may need 
some screen design per type of conjugation or declination.

...
 3) Introduce our own notation system. This is
impractical and unlikely
to work well. I suggest that instead, we adopt HamNoSys for lookup
purposes, although it is not represented by Unicode, we can try an
ASCII implementation.

 Introducing our own notation system is not an option. To a very large 
extend I want deaf people involved and they can sort it out. The only 
involvement I may have is see if it is possible to implement it within 
the confines of UW.

...
 Regarding "dialects" of Chinese and Arabic,
that is very simple. Treat
them as separate languages. While certainly most often people write in
"Standard Arabic" or "Standard Chinese", it is also possible to write
in the local vernacular. This tends to be done more with Arabic, but
is possible with either. With Chinese, you only see it very often with
Cantonese, other varieties are occasionally but you are more likely to
find a Bible translation in them than a newspaper.

 One way I am thinking is that there are often transcriptions of these 
dialects / languages. I am happy to have these as well as long as I can 
quote an authority who did the transcribing. Bible translations are one 
of the most important resources for rare languages, we will have at some 
stage a lot of Bible texts that we will analyse for its content. It may 
help us create large translation memories and translation glossaries.

...
 I hope very much that you will not restrict languages
to those which
appear on the ISO 639-3 list. It has many shortcomings and is very,
very, very disappointing -- it would not allow for separate entries
for Yavapai, Hualapai, and Havasupai (it has only one code for them
all), even though they are very much different languages, and it by no
means includes all the languages of the world. It also separates
between Moroccan, Tunisian, and Algerian Arabic, when they're really
nearly identical.

 With the design of the UW I have no technical restriction on what 
languages and dialects I add. I would insert Valkenburgs (a Limburgian 
dialect) as readily as any other language. If there is an intrest and if 
it is no original research (ie not a newly created language) I am happy 
to have it. When people can destinguish Maroccan from Tunesian etc I am 
happy to include them if only to have the different pronunciation .. The 
only thing I need is to explicitly have an agreed code for the 
languages/dialects that the WMF agrees on. This code will be very much 
internal but it needs to exist and be agreed upon. I also am thinking of 
a project that should have us hear many dialects from the translation of 
one A4 written text. The thing is to find an interesting text, a short 
story preferably and have it spoken and recorded by the dialects / 
languages that we are going to include.

Thanks Mark,
    Gerard

...
 Mark

On 20/07/05, Gerard Meijssen &lt;gerard.meijssen(a)gmail.com&gt; wrote:

>Timwi wrote:
>
>    
>
>>Gerard Meijssen wrote:
>>
>>      
>>
>>>I would welcome your comments about the ERD that I posted here
>>>http://commons.wikimedia.org/wiki/Image:ERD.jpg
>>>        
>>>
>>Looks interesting, but is extremely bare. It would do well with a bit
>>of documentation. For much of it, the purpose isn't entirely clear.
>>I'm particularly confused as to why "Language", "Word" and
"Meaning"
>>are each duplicated.
>>      
>>
>Hoi,
>There is some documentation here:
>http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design and
>http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_decisions_on_its_usage
>here.
>The duplication reflects that there is at least one table that has two
>relations to the same table. Language refers to itself for dialects,
>Word refers through Conju/Decli (conjucation or declinations) to a
>headword and derived words, Meaning is related through "Relations" this
>is to allow for thesaurus like structures.
>
>One reason why it is not as much documented as I would like is, because
>I am still working on the structure. At this moment I am thinking hard
>on how to include signed languages and the spoken dialects of the
>Chinese and Arabic written language.
>
>Thanks,
>    GerardM
> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Case sensitivity on Afrikaans Wiktionary