[Wiktionary-l] Re: De-capitalisation-isation

Andrew Dunbar hippietrail at yahoo.com
Wed Jun 23 02:51:44 UTC 2004


--- Timwi <timwi at gmx.net> wrote: > Timwi wrote:
> 
> > last time we discussed this, I think there was
> > consensus to switch the English Wiktionary to make
> > it stop capitalising article titles, right?
> 
> Guys! Please keep this on the mailing list, and
> don't reply to me personally. Thanks!

Sorry I'm so used to mailing lists having the Reply-To
set that I didn't notice.

> Brion wrote:
> 
> > This idea relies on the existence of the script
> > for step (2). :)
> 
> The idea was that I would write it. (Where else
> would my involvement in this be? :-p)
> 
> > Note that there may be an effect on usernames
> 
> I'll leave the usernames capitalised, I guess.
> 
> Ray Saintonge wrote:
> 
> > Whatever process we choose will require some
> > manual fix-up.  The process that you suggest is as
> > good as any.
> 
> My suggestion will require significantly less manual
> fix-up than moving 400,000 articles to lower-case
> versions manually.
> 
> > How can we show capitalization when a linked word
> > begins a sentence?
> 
> The same way as everywhere else, [[until|Until]].
> 
> > Is there some way to integrate upper cased, lower
> > cased and accented characters in generated lists?
> 
> I'm afraid I don't understand what you mean by this.

I think he means that there are times when
case-folding
is very handy. Though I'm not sure exactly his context
and he's also talking about accent-folding - which is
not possible without lots of messy hacking.

> > I don't remember it being a consensus at all. I
> > seem to recall some being quite opposed to it.
> 
> Really? Can you provide links? I only remember
> people emphasising that they don't mind because they
> think the current work-around work perfectly for
> them.

I seem to remember Polyglot being opposed, probably in
the Beer Parlour. Not all of us are subscribed to this
list. I stayed off because I had no space in my email
account for mailing lists until a few days ago (Thanks
Yahoo!).

> I seriously don't see how anyone can seriously
> be opposed to having a dictionary with correct
> spellings. :/

Now this statement is pure rhetoric. I'm in favour of
correct spellings and I'm sure everybody is. But
there's more than one way to solve a problem and I'd
like everybody to think this through and consider all
possible ways to fix it before jumping in and making
major changes.

The choices:
1. Case fold first letter only to uppercase. (current)
2. Case fold nothing. (Timwi)
3. Case fold all letters to lowercase. (Hippietrail)
4. A new directive to supply a prominent "Headword".
(Hippietrail)

Against 1:
 * Headwords are prominently displayed uppercase when
   most should be lowercase.
 * Words which differ only by the case of the first
   letter must share a page. (common nouns vs. proper
   nouns)
 * Words which differ only by case the case of
   subsequent letters must be on separate pages.
   (abbrevations & acronyms)
Against 2:
 * People are going to add duplicates thinking their
   word is not in the dictionary.
 * Quite a large number of entries will have to be
   changed back to uppercase after the script is run.
 * Words which differ only by case of any letter must
   be on separate pages. (proper nouns vs. common
nouns
   vs. abbrevations & acronyms)
Against 3:
 * All headwords would be prominently displayed
   lowercase when some should be uppercase.
 * Quite a large number of entries will have to be
   changed back to uppercase after the script is run.
 * Words which differ only by the case of any letter
   must share a page.

4. Is special. It can be implemented with or without
   any of the others. It will solve one issue nicely
   but I feel there are really two issues.

Issue 1
 Prominently displayed headwords are misleading
 because they are also the name of the article.

 Because the name of the article is currently always
 uppercase, all the headings generated by the Wiki
 software are in uppercase - which is very
 unprofessional for a dictionary. (This is why
 Polyglot started doing manual headwords below the
 part-of-speech section)

Issue 2
 What should and should not be on the same page.

 Currently we case-fold the first letter which means
 "bob" and "Bob" are both on a page titled "Bob"
 but
 "us" and "US" are on separate pages titled "Us" and
 "US"

I've read some peoples' opinions that they would
prefer one entry per page but this is never going to
work because of homographs anyway.
Another issue is other languages. In some languages,
certain diacritics (accents) are optional. Arabic,
Hebrew, and Latin spring to mind. Some languages have
marks which dictionaries use to show stress, which are
not used elsewhere. Japanese and Russian are examples.

If we rely on the article name to be the headword,
these will never fit. Using Polyglot's system they
fit in perfectly. Words which differ only in case,
optional diacritics, or stress marks can all be on one
page.

Conclusion:
I am in favour of showing correct spelling, case, and
diacritical marks for all entries.

I am in favour of words which differ only in minor
ways
sharing a single page.

If we can break the connection between the name of the
page and the headword, we won't have to see "PH" in
big
embarassing letters at the top of the article on "pH".

Having a directive which allows us to set the title to
"pH" is one way.
Having Wiktionary simply not display the page title,
and instead rely on "Polyglot-style" headwords alone
is another. In fact this may even possible just by
changing the stylesheet.

Let the discussion proceed...

Hippietrail.

> Timwi
> 
> _______________________________________________
> Wiktionary-l mailing list
> Wiktionary-l at Wikipedia.org
>
http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
>  

=====
http://linguaphile.sf.net/cgi-bin/translator.pl    http://www.abisource.com


	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - sooooo many all-new ways to express yourself http://uk.messenger.yahoo.com



More information about the Wiktionary-l mailing list