Re: [Wikitech-l] Ultimate Wiktionary and design decisions

26 Jul 2005

On 26/07/05, Lars Aronsson &lt;lars(a)aronsson.se&gt; wrote:
...
  There are two approaches to dictionaries: (1) The
encyclopedic
 approach, trying to find (define, spellcheck, explain, ...) "all"
 words (and their deflections), or (2) the statistics based
 approach, trying to find the most commonly used words.  I think
 the OED is of the first kind, while many dictionaries in recent
 decades (built with the help of computers, extracting word
 frequency statistics from large text corpora) have been of the
 latter kind.  Some would call (1) a 19th century approach.

 The real difference is their handling of the least common words.
 The encyclopedic approach sees every missing word as a failure,
 while the statistics based approach recognizes that there is an
 infinite number of words anyway (new ones are created every day)
 and some might be too uncommon to deserve a mention.

 As a consequence, spellchecking in the statistics based approach
 can never say that a spelling is "wrong" when it is missing from
 the dictionary, only that it probably is "uncommon" and thus
 suspect.  The remedy for this is a statics based dictionary of
 common misspellings.  Wikipedia article history can be used as a
 source for this.  Just find all edits that changed one word, e.g.
 speling -> spelling, and you will have a fine dictionary of common
 spelling mistakes.

  From a database point of view a Word has one
Spelling.  
 This would be an example of the encyclopedic approach. 
It is clear to me that the approach we want to take is the
"encyclopedic" one, simply because we can handle it. The Oxford
dictionary in paper cannot handle it "elegantly" as it becomes
unwieldy, spans a whole shelf. A good database can.

It is unacceptable for a Word to have one Spelling for reasons
described previously (German a-with-umlaut, Hebrew niqqud and optional
vowels, etc), but I am unable to find out who originally wrote that.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Ultimate Wiktionary and design decisions