New subject: Wiktionary-l Digest, Vol 4, Issue 3

5 Sep 2004

Hi,
I'm Marika, I have subscribed to your mailing because of I'm interesting in
dictionary.
Thank you
Marìka

----- Original Message ----- 
From: &lt;wiktionary-l-request(a)Wikipedia.org&gt;
To: &lt;wiktionary-l(a)Wikipedia.org&gt;
Sent: Sunday, September 05, 2004 3:31 AM
Subject: Wiktionary-l Digest, Vol 4, Issue 3

...
  Send Wiktionary-l mailing list submissions to
 wiktionary-l(a)Wikipedia.org

 To subscribe or unsubscribe via the World Wide Web, visit
 http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
 or, via email, send a message with subject or body 'help' to
 wiktionary-l-request(a)Wikipedia.org

 You can reach the person managing the list at
 wiktionary-l-owner(a)Wikipedia.org

 When replying, please edit your Subject line so it is more specific
 than "Re: Contents of Wiktionary-l digest..."

 Today's Topics:

    1. ISO-639 + Glossaries / vocabulary lists / thematical lists
       (Sabine Cretella)
    2. Re: ISO-639 + Glossaries / vocabulary lists / thematical
       lists (Gerard Meijssen)
    3. Re: The need for XML in a wiktionary context (Sabine Cretella)
    4. Re: ISO-639 + Glossaries / vocabulary lists / thematical
       lists (Sabine Cretella)
    5. Re: The need for XML in a wiktionary context (Ray Saintonge)
    6. Re: The need for XML in a wiktionary context (Gerard Meijssen)
    7. Re: The need for XML in a wiktionary context (Andrew Dunbar)

 ----------------------------------------------------------------------

 Message: 1
 Date: Sat, 04 Sep 2004 11:19:56 +0200
 From: Sabine Cretella &lt;sabine_cretella(a)yahoo.it&gt;
 Subject: [Wiktionary-l] ISO-639 + Glossaries / vocabulary lists /
 thematical lists
 To: Gerard Meijssen &lt;gerardm(a)myrealbox.com&gt;
 Cc: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;413988BC.4020004(a)yahoo.it&gt;
 Content-Type: text/plain; charset=windows-1252; format=flowed

 Hi Gerard and all of you,

 thinking about the code I was just considering some points.

 What I noted on the page you gave me for the Italian version of the
 ISO-code is that you use a mixed version for language identifiers - the
 two letter code and where there's no two letter code the three letter
 code - is this correct? I also noted that not all languages are present
 in the ISO-3-letter-code - so they are standardised, but not completely.
 This would obviously lead to an own wiktionary standard.

 I am asking as I thought about compiling a list of the used language
 codes for wiktionary and then add the several translations of the
 languages names asking freinds and colleagues to complete the list.
 Normally in the translation world the two letter code is used.

 I'll then add the list to my sourceforge project (wsi-glossary:
 http://sourceforge.net/projects/wsi-glossary/) you can see who is
 contributing right now with integrations to the lists here:
 http://wiki.wesolveitnet.com/wakka.php?wakka=WsiGlossaryContributors.

 I should modify licensing (mine up to now was the same as the one used
 for the OmegaT manual to GNU  FDL - I have to check out if this is
 possible without problems on sourceforge net. I am to new to OpenContent
 to know all about this - so another thing to be done immediately).

 If you are working on a multilanguage list e.g. of trees, birds,
 vegetables etc. etc. please consider seriously to have these lists
 integrated by other people as well and have it ready somewhere for
 download or just integrate it into wsi-glossary. Certain kinds of work
 can be done even by schools in language lessons - e.g. the Italian
 Thesaurus for OpenOffice.org was created with the help of a school where
 the teachers were the team leaders and during the classes the pupils did
 something that made "sense" to them. Having them work directly in
 wiktionary online is impossible for most schools as computers don't have
 Internet access (or only a few of them) and so working on tables is much
 easier.

 If you prefer not to hand out the list: give out single terms or gourps
 of terms like this:

 I need these term(s)
 house
 cat
 mouse
 etc.

 in the following languages:
 German
 French
 Italian
 etc.

 I can then publish these parts or on my portal or send the request to
 different lists of translators - so step by step it is possible to
 integrate and improve.

 Best wishes from Italy,

 Sabine

 -- 
 Sabine Cretella
 s.cretella(a)wordsandmore.it
 www.wordsandmore.it
 Meetingplace for translators
 www.wesolveitnet.com

 ------------------------------

 Message: 2
 Date: Sat, 04 Sep 2004 11:45:16 +0200
 From: Gerard Meijssen &lt;gerardm(a)myrealbox.com&gt;
 Subject: [Wiktionary-l] Re: ISO-639 + Glossaries / vocabulary lists /
 thematical lists
 To: Sabine Cretella &lt;sabine_cretella(a)yahoo.it&gt;
 Cc: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;41398EAC.1000801(a)myrealbox.com&gt;
 Content-Type: text/plain; charset=windows-1252; format=flowed

 Sabine Cretella wrote:

  Hi Gerard and all of you,

 thinking about the code I was just considering some points.

 What I noted on the page you gave me for the Italian version of the
 ISO-code is that you use a mixed version for language identifiers -
 the two letter code and where there's no two letter code the three
 letter code - is this correct? I also noted that not all languages are
 present in the ISO-3-letter-code - so they are standardised, but not
 completely. This would obviously lead to an own wiktionary standard.

 I am asking as I thought about compiling a list of the used language
 codes for wiktionary and then add the several translations of the
 languages names asking freinds and colleagues to complete the list.
 Normally in the translation world the two letter code is used.

 I'll then add the list to my sourceforge project (wsi-glossary:
 http://sourceforge.net/projects/wsi-glossary/) you can see who is
 contributing right now with integrations to the lists here:
 http://wiki.wesolveitnet.com/wakka.php?wakka=WsiGlossaryContributors.

 I should modify licensing (mine up to now was the same as the one used
 for the OmegaT manual to GNU  FDL - I have to check out if this is
 possible without problems on sourceforge net. I am to new to
 OpenContent to know all about this - so another thing to be done
 immediately).

 If you are working on a multilanguage list e.g. of trees, birds,
 vegetables etc. etc. please consider seriously to have these lists
 integrated by other people as well and have it ready somewhere for
 download or just integrate it into wsi-glossary. Certain kinds of work
 can be done even by schools in language lessons - e.g. the Italian
 Thesaurus for OpenOffice.org was created with the help of a school
 where the teachers were the team leaders and during the classes the
 pupils did something that made "sense" to them. Having them work
 directly in wiktionary online is impossible for most schools as
 computers don't have Internet access (or only a few of them) and so
 working on tables is much easier.

 If you prefer not to hand out the list: give out single terms or
 gourps of terms like this:

 I need these term(s)
 house
 cat
 mouse
 etc.

 in the following languages:
 German
 French
 Italian
 etc.

 I can then publish these parts or on my portal or send the request to
 different lists of translators - so step by step it is possible to
 integrate and improve.

 Best wishes from Italy,

 Sabine
  Wikimedia does use two letter ISO 639 codes and when they do not exist
 they do use the three letter codes. There are missing ISO codes. There
 are also the SIL codes but personally I think mixing these three codes
 makes a mess. Preferably ISO adds missing codes for languages.

 For cooperation to work best, things like XML can be considered. GEMET
 uses it, they have people knowledgable regarding thesauri XML open content.
...

 When you have an application that can import and export XML data, you
 can work off line locally and export the data at the end of the day. The
 start might be the Italian Open Office list and add definitions in
 Italian export it and share it with the world. An even better start
 might be words in another wikipedia with an Italian translation; the
 translations TO Italian are then already known.

 The most important thing is to prevent double work and the continued
 checking of the stuff that is available. Start with producing
 definitions for all the Languages. Many translations are available on
 the nl:wiktionary. The articles can be copied to it:wiktionary just add
 content to some templates. They do need checking as well... :)

 Thanks,
   Gerard

 ------------------------------

 Message: 3
 Date: Sat, 04 Sep 2004 11:58:10 +0200
 From: Sabine Cretella &lt;sabine_cretella(a)yahoo.it&gt;
 Subject: Re: [Wiktionary-l] The need for XML in a wiktionary context
 To: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;413991B2.20806(a)yahoo.it&gt;
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 Geerd, please have a look at this:

 *********************************************************
 <glossword>
 -
     <line>
 <term t1="A" t2="AT" id="4">attuatore</term>
 -
     <defn>
 <trns lang="106">Trieb</trns>
 <trns lang="106">Arbeitszylinder</trns>
 Trieb ->azionatore (comandi idraulici); Arbeitshylinder -> ölhydraulisch
 </defn>
 </line>
 -
     <line>
 <term t1="B" t2="BA"
id="62">batteriostatico</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">bakteriostatisch</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="B" t2="BO" id="54">bovino</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">bovin</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="C" t2="CE" id="33">cellule
ematiche</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">Blutzellen</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="C" t2="CO" id="37">concentrazione
serica</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">Serumkonzentration</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="E" t2="EM" id="27">ematico</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">Hämato-</trns>
 <trns lang="106">Blut-</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="E" t2="EN" id="8">endovenoso</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">intravenös</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="E" t2="ET" id="5">etambutolo</term>
 -
     <defn>
 <trns lang="106">Ethambutol</trns>
 <trns lang="103">ethambutol</trns>
 <src>http://www.gesundheit.de/roche/ro10000/r10785.html</src>
 </defn>
 </line>
 -
     <line>
 <term t1="F" t2="FL" id="24">fleboclisi</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">intravenöse Infusion</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="I" t2="IN" id="10">iniezione
endovenosa</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">intravenöse Injektion</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="L" t2="LE" id="77">legno
impiallacciato</term>
 -
     <defn>
 <abbr lang="025"/>
 <trns lang="106">furniertes Holz</trns>
 <src>eurodicautom</src>
 </defn>
 </line>
 -
     <line>
 <term t1="L" t2="LE" id="71">legno
tamburato</term>
 -
     <defn>
 <abbr lang="025"/>
 <trns lang="106">furniertes Holz</trns>
 <src>eurodicautom</src>
 </defn>
 </line>
 -
     <line>
 <term t1="M" t2="MA" id="44">mass media</term>
 -
     <defn>
 <trns lang="106">Massenmedien</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="O" t2="OC"
id="51">ocratossina</term>
 -
     <defn>
 <abbr lang="054"/>
 <trns lang="106">Ochratoxin</trns>
 -
     <src>
 http://www.verbraucherministerium.de/forschungsreport/rep2-99/ochra.htm
 </src>
 -
     <src>
 Schimmelpilze sind in der Lage, unter sehr unterschiedlichen Bedingungen
 giftige Substanzen (Mykotoxine) zu bilden, die in der gesamten
 Nahrungskette vorkommen können. Zu den bekanntesten Vertretern zählt das
 Ochratoxin A, das von bestimmten Penicillium- und Aspergillus-Arten
 gebildet wird. Dieses Toxin kann die Nieren und das Immunsystem
 schädigen und zeigt im Tierversuch eine kanzerogene Wirkung. Der Aufgabe
 des Gesetzgebers, einen umfassenden Schutz der Verbraucher zu
 gewährleisten, ist das Bundesministerium für Gesundheit (BMG)
 nachgekommen und hat eine wissenschaftliche Studie über Ochratoxin A
 initiiert und gefördert, die 1996 begonnen und 1999 abgeschlossen wurde.
 Das Projekt hatte zum Ziel, Verzehrsdaten auf epidemiologischer
 Grundlage zu ermitteln und Ochratoxin A in relevanten Lebensmitteln
 sowie im Blutserum von Probanden zu bestimmen. Aus der Verknüpfung der
 Ergebnisse von Verzehrsdaten, Lebensmitteluntersuchungen und
 Blutserum-Analysen können eine fundierte Beurteilung der tatsächlichen
 Exposition der Bevölkerung in Deutschland erstellt und Empfehlungen für
 eine Höchstmengenregelung abgeleitet werden. (weiteres auf der Website)
 </src>
 </defn>
 </line>
 -
     <line>
 <term t1="P" t2="PE" id="17">per via
endovenosa</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">intravenös</trns>
 </defn>
 </line>
 -
     <line>
 <term t1="S" t2="SO" id="68">soluzione
fisiologica</term>
 -
     <defn>
 <abbr lang="027"/>
 <trns lang="106">Ringer Lösung</trns>
 </defn>
 </line>
 </glossword>

 ****************************************************
 This was the original glossary project where I tried to ask people for
 co-operation, but online I had just two members trying to help ...
 that's why I passed to .csv tables that can then easily be converted
 itno xml in a second stage (most people don't like working online even
 if they are connected 24 hrs a day ...). Glossword (www.glossword.info)
 is an OpenSource project - Sorceforge page:
 http://sourceforge.net/projects/glossword/

 Instead of ISO-language code Glossword uses numbers to identify
 languages. This is a minor problem as it is easy to change to ISO using
 search and replace.

 My installation of the software can be found here:
 www.dict.wesolveitnet.com. If you'd like to try around there, just let
 me know I'll then create a user account with access to all dics. Really
 I am not sure if this can be useful ...  maybe just for conversion issues?

 Ciao, Sabine

 *************

 Sabine Cretella
 s.cretella(a)wordsandmore.it
 www.wordsandmore.it
 Meetingplace for translators
 www.wesolveitnet.com

 ------------------------------

 Message: 4
 Date: Sat, 04 Sep 2004 12:16:42 +0200
 From: Sabine Cretella &lt;sabine_cretella(a)yahoo.it&gt;
 Subject: [Wiktionary-l] Re: ISO-639 + Glossaries / vocabulary lists /
 thematical lists
 To: Marc Prior &lt;mail(a)marcprior.de&gt;de>, wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;4139960A.9080909(a)yahoo.it&gt;
 Content-Type: text/plain; charset=us-ascii; format=flowed

 Hi Gerard,

  Wikimedia does use two letter ISO 639 codes and
when they do not exist
 they do use the three letter codes. There are missing ISO codes. There
 are also the SIL codes but personally I think mixing these three codes
 makes a mess. Preferably ISO adds missing codes for languages. 
 I completely agree to this.

   > For cooperation to work best, things
like XML can be considered. GEMET
 > uses it, they have people knowledgable regarding thesauri XML open
 > content.

 XML is one of the best solutions as the standard for CAT (computer aided
 translation) software is tmx for memories and tbx for glossaries - and
 this is nothing else than "definite" xml codes.

   > When you have an application that can
import and export XML data, you
 > can work off line locally and export the data at the end of the day.
 > The start might be the Italian Open Office list and add definitions in
 > Italian export it and share it with the world. An even better start
 > might be words in another wikipedia with an Italian translation; the
 > translations TO Italian are then already known.

 Hmmm ... data needs to be multilanguage, doesn't it? Or at least must be
 identified by language tags. To edit bilingual data we could use OmegaT
 (that adds language tags to the tmx 1.1 file) - then we could use the
 tmx-file to import data. When new terms are added to a list using the
 old tmx file they will be automatically given as translated so the
 translator just needs to translate the missing part that is stored in a
 new translation memory file in tmx format. OmegaT is java based,
 therefore platform independent and Open Source ... the created files of
 single "words" on the other hand could then be used as glossary files.
 Maybe we could try this out with the translations of the ISO
 languages-table.

   > The most important thing is to
prevent double work and the continued
 > checking of the stuff that is available. Start with producing
 > definitions for all the Languages. Many translations are available on
 > the nl:wiktionary. The articles can be copied to it:wiktionary just
 > add content to some templates. They do need checking as well... :)

 I'll do that right now - I just created a table with the codes and now
 insert the missing English names.

 Ciao, Sabine

 ------------------------------

 Message: 5
 Date: Sat, 04 Sep 2004 03:57:01 -0700
 From: Ray Saintonge &lt;saintonge(a)telus.net&gt;
 Subject: Re: [Wiktionary-l] The need for XML in a wiktionary context
 To: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;41399F7D.5020507(a)telus.net&gt;
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 The current format is adequate.  Your proposal makes no mention of the
 possible sacrifices in terms of ease of editing, a key feature in all
 the wikis.  How will flexibility of format be maintained?

 Ec

 Gerard Meijssen wrote:

  There is a need for using XML with wiktionary.
   > The definitions of a word in
wiktionary, can be structured in a fixed
 > way. For each word/phrase you have a:
 > *Indication what language it is in
 > *Name of the word/phrase
 > *Definition of the word/phrase
 > *Translations
 > *Pronounciation
 > *Synonyms
 > *Antonyms etc
   > I do not try to be complete here, but
my point is, the data is
 > structured. Other organisations that work with words already structure
 > their data using XML for instance GEMET. When Wiktionary is structured
 > exlicitly, the result will be that the import and export from
 > Wiktionary becomes possible and it will become possible for other
 > dictionary/ glossary project to get a changed content in XML format
 > that is specific for working with words. This will enhance the
 > importance of wiktionary and it will help achieve out aim which is
 > open accessible dictionary content.
   > The flip side of the coin is that we
can get {changed) content from
 > other dictionary/ glossary projects for inclusion in wiktionary.
   > The GEMET data is available as XML
data and, it would be great to
 > import it straight in from XML.
   > *Issues:
 > #Using XML standards for dictionary content.
 > #Importing data / Exporting data using the current wiktionary content.
 > #Structuring wiktionary using MySQL tables.
 > #Importing data / Exporting data using the future wiktionary structure.

 ------------------------------

 Message: 6
 Date: Sat, 04 Sep 2004 13:45:56 +0200
 From: Gerard Meijssen &lt;gerardm(a)myrealbox.com&gt;
 Subject: Re: [Wiktionary-l] The need for XML in a wiktionary context
 To: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;4139AAF4.3080906(a)myrealbox.com&gt;
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 Ray Saintonge wrote:

  The current format is adequate.  Your proposal
makes no mention of the
 possible sacrifices in terms of ease of editing, a key feature in all
 the wikis.  How will flexibility of format be maintained? 
 XML is not to edited by hand. You are absolutely right about that.
 However, the current format is not without its problems. At this moment
 an English word cannot be re-used easily in other languages. Things are
 free formatted at the moment. It would be a good thing if we start
 thinking about creating some database structures for use within
 wiktionary. It would rid us of these dratted templates like {{en}} and
 {{-en-}}. They work, it is the best thing around but they are ugly.

 What I propose at this time is to get us thinking about importing and
 exporting in an XML format. And considering changes to enhance the
 functionality within all wiktionaries and the functionality to the
 outside world.

 One of the aims of wikimedia is to create open content. By having our
 data in our proprietary format, we do not achieve what can be achieved.

 Thanks,
     Gerard

 ------------------------------

 Message: 7
 Date: Sun, 5 Sep 2004 02:30:57 +0100 (BST)
 From: Andrew Dunbar &lt;hippietrail(a)yahoo.com&gt;
 Subject: Re: [Wiktionary-l] The need for XML in a wiktionary context
 To: wiktionary-l(a)Wikipedia.org
 Message-ID: &lt;20040905013057.52937.qmail(a)web53702.mail.yahoo.com&gt;
 Content-Type: text/plain; charset=iso-8859-1

  --- Gerard Meijssen &lt;gerardm(a)myrealbox.com&gt; wrote:
  There is a need for using XML with wiktionary.
 I agree.

  The definitions of a word in wiktionary, can be
 structured in a fixed way.  I disagree. But it depends on *how much* structure you
 want.

  For each word/phrase you have a:
 *Indication what language it is in
 *Name of the word/phrase
 *Definition of the word/phrase
 *Translations
 *Pronounciation
 *Synonyms
 *Antonyms etc  Actually If we only wanted to structure these parts it
 would work ok. Many other properties of words and
 phrases are a lot more difficult, such as
 part-of-speech.

 > I do not try to be complete here, but my point is,
 > the data is structured. Other organisations that
 > work with words already structure their data using
 > XML for instance GEMET.
 > When Wiktionary is structured exlicitly, the result
 > will be that the import and export from Wiktionary
 > becomes possible and it will become possible for
 > other dictionary/ glossary project to get a changed
 > content in XML format that is specific for working
 > with words. This will enhance the importance of
 > wiktionary and it will help achieve out aim which is
 > open accessible dictionary content.
   > The flip side of the coin is that we
can get
 > {changed) content from other dictionary/ glossary
 > projects for inclusion in wiktionary.
   > The GEMET data is available as XML
data and, it
 > would be great to import it straight in from XML.
   > *Issues:
 > #Using XML standards for dictionary content.
 > #Importing data / Exporting data using the current
 > wiktionary content.
 > #Structuring wiktionary using MySQL tables.
 > #Importing data / Exporting data using the future
 > wiktionary structure.
   > NB I have posted this on META as
well

 I do think a dictionary requires structure which an
 encyclopedia does not. A very loose structure like you
 have described would be a benefit for Wiktionary.
 The problems I see are these:
 1. Once we have some structure people will push for
    more structure such as part-of-speech, not
 realizing
    how difficult that is to get right in a
 multilingual
    dictionary.
 2. To work with the wiki software we can have a tool/
    script/routine which maps from internal XML into
    wiki/HTML so it can be displayed.
 3. People will have to input XML, or we need a
 friendly
    interface which can take input from non-expert
 users
    and turn it into correct XML.

 Number 3 would mean a *lot* of work for developers.

 Andrew (hippietrail).

 > Thanks,
 >     GerardM
   >
_______________________________________________
 > Wiktionary-l mailing list
 > Wiktionary-l(a)Wikipedia.org

http://mail.wikipedia.org/mailman/listinfo/wiktionary-l

 =====
 http://linguaphile.sf.net/cgi-bin/translator.pl 
http://www.abisource.com
...

 ___________________________________________________________ALL-NEW Yahoo! Messenger
- all new features - even more fun!  http://uk.messenger.yahoo.com
...

 ------------------------------

 _______________________________________________
 Wiktionary-l mailing list
 Wiktionary-l(a)Wikipedia.org
 http://mail.wikipedia.org/mailman/listinfo/wiktionary-l

 End of Wiktionary-l Digest, Vol 4, Issue 3
 ******************************************

Re: Wiktionary-l Digest, Vol 4, Issue 3