On 7/24/05, Gerard Meijssen <gerard.meijssen(a)gmail.com> wrote:
Hoi,
I had an interesting conversation with Brion. We do not agree on
everything. One of the things we do not agree on are redirects.
In my opinion, Wiktionary should not have redirects. A word is either
spelled correctly and it will have its lemma or it is not and there will
not be a lemma with the incorrect spelling.
In Brions opinion there are links to lemmas and as we need to ensure
that these links remain ok, we need redirects to make this possible.
In a Wikipedia context I am 100% with Brion. In a Wiktionary context it
is a different matter. As only correctly spelled words should be in a
Wiktionary, errors should be deleted. Some of our Wiktionaries for
historical reasons are capitalising their articles.
"Historical reasons" is surely not the only reason a Wiktionary uses
first-character capitalisation, turning off first-character capitalisation is
not the only way to achieve correctly spelled article titles, and having
correctly spelled article titles has been denied as a reason for turning
off first-character capitalisation by some.
Don't forget that capitalisation of the first letter is only one issue as
regards correct orthography in article titles. Another thing to watch out
for is the three variations for making English compounds: two words
with a space, two words with hyphenation, and one compound word.
For any term, any one, two or three of these variants may be
considered correct.
Other test cases which have recently met strong resistence are Latin
words correctly spelled in all capitals as was the only possible spelling
while Latin was a living language, and using the correct non-ambiguous
apostrophe character which has been widely available on home
computers for 20 years.
Other languages also have optional or compatibility spellings:
In French it is officially correct to indicate accents on capital
letters but there is a de facto rule to leave them out. German
specifically allows ä, ö, and ü to be spelled as ae, oe, and ue.
In Switzerland there is no letter ß, the correct spelling to be ss
instead. This is also spefically considered correct in the other
German-speaking countries. Latin and Old English often have
macrons to show long vowels and rarely have breves to show
short vowels.
Old English and Middle English also had various fashions but
no official spelling, with various exotic letters being used at
different times and under various circumstances, resulting in
varied spellings of many words. For example, ð and þ were
mostly interchangeable.
Ancient Greek and Modern Greek have different accent marks
which look quite similar but have different names and different
places in Unicode. But the Modern Greek accents are still much
more commen in Ancient Greek on the Internet.
Hebrew geresh is often represented by ASCII apostrophe, Hebrew
gershayim is often represented by ASCII double quote, Hebrew
maqaf is often represented by ASCII hyphen, Hawaiian okina is
often represented by ASCII apostrophe, Turkish long vowels (actually
more complicated than this) can be indicated by use of the circumflex
accent according to the offical orthographical rules, Russian (and some
other Cyrllic script languages) can optionally indicate where the stress
is and in some contexts it is the norm. With Hebrew and most
languages in Arabic script, all short vowels are optional as are a
number of other "letters" such as dagesh, shada, sukun, and a host
of more exotic ones.
Hebrew also has accents which only occur in religious works plus
there are plene and defective spellings and both have vowels etc as
optional extras on top.
In some Polynesian languages, it is macrons and glottal stops are
optional, in others they are compulsory.
Chinese, Japanese, and Korean have written variants of many
characters which have the same meaning and sound with all being
correct. They also have variants which exist only due to computer
encodings and quirks in how various fonts were designed.
For some languages different optional features of orthography can
interact to from many combinations and permutations all of which
are correct spellings.
There are surly quite a few more examples I haven't even become
aware of yet.
In essence this
means that from a spelling point of view the name of the lemmas are
irrelevant. However, many people assume that the name of the article
indicates that a word is spelled correctly. To remedy this, more and
more wiktionaries are moving away from first character capitalisation
and make it possible to have correctly spelled words as a lemma.
Or they are moving due to rhetoric like this email rather than for any
good reason. Remember that in print dictionaries the norm is to
include different meanings and parts of speech, and even derivatives -
regardless of capitalisation - into one article or at least on the one page.
English Wiktionary still considers only first letter capitalisation and
ASCII apostrophes and Russian without stress marks to be correct
enough to be titles, in the last case even as redirects! (if this is the
meaning of "lemma" you mean). What do other Wiktionaries do?
When a wiktionary has made this move away from first
character
capitalisation, the interwiki and interproject links within the
Wikimedia projects need to be fixed. After this, the redirects can in my
opinion be removed. I think this is appropriate because users expect
that an application behaves in certain ways. When new content is added
to a non-capitalised Wiktionary, the word foo will not have a redirect
in Foo and consequently it behaves differently from the content
predating the move to non-capitalisation. Also words like Kinder and
kinder are not related at all.
Don't you mean that "not all words like Kinder and kinder are related"?
This is almost the opposite meaning. Also many words are related.
Even in German it is common for a noun and another part of speech
to be intimately related and share an identical spelling apart from
capitalisation.
The redirect at Kinder will be replaced
at some stage breaking the existing redirect and consequently not
providing the continuance that Brion holds dear.
For the Ultimate Wiktionary I have documented some of the design
criteria. It can be found here:
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_decisions_on_its_usage
The Data design can be found here:
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design
One crucial decision is that only correct spelling is allowed. This
means that all incorrect spelling will be amended or deleted. As
Ultimate Wiktionary is a database, it does not cater for things like
redirects. I urge you to have a look at both the design criteria and the
design itself because this is the time when it is relatively easy to
make changes. Once Erik starts coding the UW database, having finished
Wikidata and the GEMET implementation, the moment has passed us by.
Please list out of the above points what is and what is not considered
a correct spelling as Ultimate Wiktionary is concerned. Please then
indicate whether every correct spelling is also suitable as a headword/
article title/lemma or whatever you wish to call it.
Hippietrail.
Thanks,
GerardM
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
--
http://linguaphile.sf.net