Hi Gerard,
Actual work on UW itself is underway. Here you can
find the data desisgn
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design This
design is very much open for comments and I am happy to say that many
comments that were given have led to changes. I name but a few changes
that came about this way; Can sign languages be included - now they can,
Can attestations be included - now they can.
I want to propose (again) to make one
important change:
I think it is important that an entry within one language can be tagged as
being correct according to several orthographies within one language. From
what I understood so far, I find that the word
de: "ist" (English: "(he) is") must be inserted twice, once for the
new German
spelling and once for the old (before the recent reform). Even thogh this
word was not affected by the spelling reform. This applies to 95% of all
German words. And each of them gets complete translation coverage into all
languages. This is also a problem for Low Saxon (with our wide range of
possible spellings). You have tried to make your current design plausible to
me when we talked about it recently, but I was not convinced that this huge
multiplication of entries is a good idea. Maybe I misunderstood you somehow,
but I still do not understand it.
Then again, if we create a wordcount on the Wikipedia
content, run it
against a spellchecker, the resulting list should be spelled correctly
and could be included in UW. Particularly for our biggest wikipedias and
the amount of topics covered, it should be a list that might be close to
the size of what Aspell has. We will also have a long list of words
missing in Aspell. We will however not get a spellchecker for British or
American in this way.
Does that mean that you think about importing huge amounts of
words without
definition and without any translation?
Heiko