Heiko Evermann wrote:
Hi Gerard,
Thank you for your answer.
The German situation is a bit difficult. In actual
fact there are only
two orthographies because two Bundeslander did not pass as law that the
new spelling would apply there as well. The consequence is that both old
spelling and new spelling are valid. In a typical situation, the words
that have been changed would get dated and be outdated. From a practical
point of view I would only have the changed words and the new words
included and I would treat them as if these two Bundeslander had voted
in favour. For lookup purposes the difference is a SELECT statement in
the query statement.
So you do not want to include the old spelling? From what I understood for Low
Saxon you also wanted to include historic spellings. But I may have
misunderstood that.
Sorry, good try but no cigar. The words that are spelled differently
will both be in there. They will both have a record in ValidExpression
where the old spelling will have a value in the ValidUntil field and the
new spelling will have a value in the ValidFrom field.
There is room for historic orthographies, it may prove instructive in
demonstrating the ongoing Germanisation of Lower Saxon
The argument why all words have to be explicitly
identified as belonging
to an orthography is because it allows us to do other things than just
producing lexicological information from the Internet. What in your
perception is an "multiplication of entries" is in actual fact no such
thing; an expression is registered only once for each language, dialect
or orthography.
So
number of entries = (number of languages) x (number of dialects) x (number of
orthographies)?
What are you planning to do with American English vs. British English?
You would have two entries:
1)
title=colour
lang=EN
dialect=EN_US
orthography=USA-official
2)
title=color
lang=EN
dialect=EN_GB
orthography=GB official
That is fine. But what about "bus"? would you have two entries?
1)
title=bus
lang=EN
dialect=EN_US
orthography=USA-official
2)
title=bus
lang=EN
dialect=EN_GB
orthography=GB official
That (to my understanding) would double the entries for English, wouldn't it?
And the translation of de:Bus would list en_US: bus, en_GB:bus?
Kind regards,
Heiko Evermann
First of all I am not a specialist when it comes to the spelling of
American English or British English. Depending on there being an
official body that identifies correctly spelled English, a spelling can
be either validated by one organisation or by two organisations. When
this is the case, there is no need for duplication. This is
functionality implicitly there in the data design.
The examples that you show bear no relation to what UW will look like
nor how the edit screens will look like I am happy to say :) There is
this big difference in the attitude of the way Lower Saxon is treating
its orthograhies and the way Sicilan or Napolitan orthographies are
treated. The Lower Saxon seem really eager to have only one orthography
and therefore a mix of the different spellings is not likely to find
much apreciation by many.
The duplication of words that are spelled the same in different dialects
or orthographies is inherent in the database design. This is essential
if you want to have definitions and etymology in these dialects or
orthographies. If you are willing to accept that definitions and
etymology can be spelled in orthographies other than Sass there could be
a solution but as the nds.wikipedia also has to standardise on Sass, I
think this is a rather unlikely scenario.
Thanks,
GerardM