On Monday 25 July 2005 09:02, Andrew Dunbar wrote:
On 7/24/05, Gerard Meijssen
<gerard.meijssen(a)gmail.com> wrote:
accent according to the offical orthographical rules, Russian (and some
other Cyrllic script languages) can optionally indicate where the stress
is and in some contexts it is the norm. With Hebrew and most
Yes, and one of the places where it is the norm is in a dictionary ;)
Regardless of how is this resolved in the end, it would make sense to built in
at least some ability of determining such things automatically. You don't
want to duplicate entire Russian corpus (with inflections, it could easily
rise to ten million words), so that you could have each one of them with and
without diacritics. It makes sense to have only canonical spellings in the
dictionary, and a bit of code to offer nearest match when someone tries to
retrieve a word spelled in a different way.
One crucial
decision is that only correct spelling is allowed. This
means that all incorrect spelling will be amended or deleted. As
Ultimate Wiktionary is a database, it does not cater for things like
redirects. I urge you to have a look at both the design criteria and the
design itself because this is the time when it is relatively easy to
make changes. Once Erik starts coding the UW database, having finished
Wikidata and the GEMET implementation, the moment has passed us by.
Please list out of the above points what is and what is not considered
a correct spelling as Ultimate Wiktionary is concerned. Please then
indicate whether every correct spelling is also suitable as a headword/
article title/lemma or whatever you wish to call it.
The way I see it, this decision is a political and not a technical one. Each
word could have several spellings, each of which is related to a spelling
authority. If you want common misspellings in the dictionary, simply have
"Common misspelling" as a spelling authority. Similarly, nothing prevents you
from having several different spellings of a same word attributed to a single
spelling authority, which solves all the problems you mentioned above.