[Wikipedia-l] Common typos (was: Tucci528)

Ray Saintonge saintonge at telus.net
Sat Sep 14 18:00:46 UTC 2002


Andre Engels wrote:

>>It would be a safe a assumption that words that are used the most 
>>frequently are probably spelled correctly.  True, "recieve" may be a very 
>>common miss spelling, but there are probably a lot more occurrences of 
>>"receive".  So the flip side of this is that words that are used rarely are 
>>probably spelled wrong.  Now we don't want to go off blindly replacing 
>>these words (mostly because we would know what with) but they are good 
>>words to look for for replacing.
>>
>I don't think this would work - I think there will be thousands, if not tens
>of thousands of words that are used exactly once. The great majority of these
>will not be mis-spellings, but (parts of) proper names and geographical
>names that happen to occur exactly once, words from other languages and
>reasonable neologisms.
>
I agree with Andre.  I suspect that the distribution of mis-spellings 
and typos would be Zipfian, as would the finding and corrections for 
such words.  "Antibacterial" occurs 4 times in Wikipedia.  No amount of 
direct searching would have found "actibacterial"

One feature that would help for finding some common errors would be a 
search feature that allows optional searching for parts of words.  Thus 
far we have spoken of "recieve" and "recieved", but being able to search 
the part word "reciev" would also catch "recieves", "reciever", 
"recievers", "recieving" and maybe other less common words with this root.

>Alexandre Fleming
> actibacterial - actual typo (*)
>
Perhaps the article should be redirected to "Alexander Fleming" ;-) .

>Apu Nahasapeempatilon
> octuplet - logical neologism or normal word
> punchcard - might actually be considered a misspelling (*)
>
Octuplet is a perfectly normal word.  
"Punchcard" is an acceptable (and IMHO preferable) variant of "punch 
card".  It appears that way in the Oxford Dictionary.  If I were to use 
the term in an article and someone else "corrected" it to two words, I 
would change it right back.
Who will ever have the enthusiasm to check the spelling of Apu's surname?

Eclecticology




More information about the Wikipedia-l mailing list