Re: [Wikitech-l] Re: Spell checking in MediaWiki

31 Aug 2005

Heiko Evermann wrote:

...
 Hi Gerard,

Thank you for your answer.

 The German situation is a bit difficult. In actual
fact there are only
two orthographies because two Bundeslander did not pass as law that the
new spelling would apply there as well. The consequence is that both old
spelling and new spelling are valid. In a typical situation, the words
that have been changed would get dated and be outdated. From a practical
point of view I would only have the changed words and the new words
included and I would treat them as if these two Bundeslander had voted
in favour. For lookup purposes the difference is a SELECT statement in
the query statement.

 So you do not want to include the old spelling? From what I understood for Low 
Saxon you also wanted to include historic spellings. But I may have 
misunderstood that.

 Sorry, good try but no cigar. The words that are spelled differently 
will both be in there. They will both have a record in ValidExpression 
where the old spelling will have a value in the ValidUntil field and the 
new spelling will have a value in the ValidFrom field.

There is room for historic orthographies, it may prove instructive in 
demonstrating the ongoing Germanisation of Lower Saxon

...

 The argument why all words have to be explicitly
identified as belonging
to an orthography is because it allows us to do other things than just
producing lexicological information from the Internet. What in your
perception is an "multiplication of entries" is in actual fact no such
thing; an expression is registered only once for each language, dialect
or orthography.

 So 
number of entries = (number of languages) x (number of dialects) x (number of 
orthographies)?

What are you planning to do with American English vs. British English?

You would have two entries:
1)
title=colour
lang=EN
dialect=EN_US
orthography=USA-official
2) 
title=color
lang=EN
dialect=EN_GB
orthography=GB official

That is fine. But what about "bus"? would you have two entries?
1)
title=bus
lang=EN
dialect=EN_US
orthography=USA-official
2) 
title=bus
lang=EN
dialect=EN_GB
orthography=GB official

That (to my understanding) would double the entries for English, wouldn't it? 
And the translation of de:Bus would list en_US: bus, en_GB:bus?

Kind regards,

Heiko Evermann
 First of all I am not a specialist when it comes to the spelling of 
American English or British English. Depending on there being an 
official body that identifies correctly spelled English, a spelling can 
be either validated by one organisation or by two organisations. When 
this is the case, there is no need for duplication. This is 
functionality implicitly there in the data design.

The examples that you show bear no relation to what UW will look like 
nor how the edit screens will look like I am happy to say :) There is 
this big difference in the attitude of the way Lower Saxon is treating 
its orthograhies and the way Sicilan or Napolitan orthographies are 
treated. The Lower Saxon seem really eager to have only one orthography 
and therefore a mix of the different spellings is not likely to find 
much apreciation by many.

The duplication of words that are spelled the same in different dialects 
or orthographies is inherent in the database design. This is essential 
if you want to have definitions and etymology in these dialects or 
orthographies. If you are willing to accept that definitions and 
etymology can be spelled in orthographies other than Sass there could be 
a solution but as the nds.wikipedia also has to standardise on Sass, I 
think this is a rather unlikely scenario.

Thanks,
    GerardM

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Spell checking in MediaWiki