[Wikipedia-l] Categories considered harmful

apogr info at apogr.hu
Sun Jun 20 09:22:58 UTC 2004


1. The list
Wikipedia, other encyclopedias, dictionaries, etc. as we know it, have the following in common: they are a list of usually single words (entry words, items, index words, etc.) associated with longer texts (articles) for which connection (whether explicit or implicit) the whole collection is sought. The way the number of items in the list is growing is either by inclusion of a new item that an author knows enough, or by the inclusion of a new word as a stub that a curious reader is not knowledgeable enough and wants to know more of. Each entry word first is found somewhere out there (in the context) and is then decontextualised, lemmatised, etc.) to be included in that list. If there are homonyms, the original context of each occurence is partially reconstructed and marked by disambiguation. 

Context, however, should be considered to introduce additional relevant points (sometimes the originator for example), and if you go on without exactly specifying the relations that a given context reflects, you should have a lot more contexts shown in a structured fasion than normally found in dictionaries and/or in encyclopedias (compare: categories, senses, semantic web, etc.).

Given that wikipedia and its sister projects are word-centered devices you can hardly  go beyond basic grammar terms and the alphabet used a) to bring an entry word to a common form (noun, single word), and b) to sort them according to a single ("senseless") criterion that keeps resulting in an alphabetic index only to rely on in searching. 

  

2. The text/article

 

Since on learning about the world and/or words all of us progress from known items toward unknown items by establishing various new connections and sorting the input individually, it may be desirable to include a ring of pointers that a) take you from scratch (trivia, dictionary entry) to the latest advances connected to an entry word, b) allow you to check for covering the complete process of acquiring knowledge in a given subject in succession. And context therefore should not be dispersed aroud the free-floating text. In other words, knowledge associated with an item should be graded and presented accordingly. Using a hypertext structure is fine, but not using  other text processing tools such as concordance (KWIC/KWOC) programs to recreate and/or remodel textual information is not quite comprehensible with so many computers at hand.

 

 apogr



----- Original Message ----- 
From: "Ray Saintonge" <saintonge at telus.net>
To: <wikipedia-l at Wikimedia.org>
Sent: Saturday, June 19, 2004 11:45 PM
Subject: Re: [Wikipedia-l] Categories considered harmful


> Jakob wrote:
> 
> >Categories considered harmful
> >
> I generally appreciate your comments, but I would never go so far as to 
> consider this initiative harmful.
> 
> >Since Version 1.3 of MediaWiki we have the nice category function. In the 
> >german wikipedia there is a lot of confusion and struggle on how to use 
> >categories in the right way. As a student of library science I could tell 
> >several methods how to classify, index and sort things but none of them 
> >seems to be applicable easily with the current implementation of categories.
> >
> The confusion and struggle that you are experiencing on the german 
> Wikipedia is being faced by all the projects.  Each is likely to find 
> its own way of dealing with the issue, and the solutions are likely to 
> show considerable variation.  That's fine and very wiki.  Some will 
> undoubtedly be discarded at a later stage, as a part of a normal 
> evolutionary process.  We need to avoid being overly critical of those 
> who experiment with other possibilities.
> 
> I also believe that effective categorization depends on having a 
> properly functioning internal search function.  Hopefully, the day will 
> come when our developpers will be able to get past the constant stresses 
> on the system, and find something that does not depend on Google. :-)
> 
> >As far as I can tell there are three main reasons for Wikipedia's success:
> >
> >1. It's very easy to contribute (Wikitax, everybody can edit)
> >2. Every edit is monitored in watchlists and list of lasts edits
> >   so we can control each other
> >3. There is a clear common mission - to create an encyclopedia (+NPOV)
> >
> Agreed.
> 
> >As far as I also can see the category-function contradicts all of them:
> >
> To a significant extent, yes.
> 
> >1. It's not easy. 
> >
> >It's not easy to know how to do it in the right way because subject 
> >indexing is a complex issue and it's not easy because of lacks in the 
> >implementation (no rename, no redirects, no assignment of articles to 
> >categories without editing every single the article pages). Editing an 
> >article I have to guess which categories are existing, how they are 
> >spelled and the rules what to classify into them and what not.
> >
> Perhaps it's too easy.  Anybody can propose any new category, including 
> misspelled ones.  Doing it without creating chaos is a different thing.  
> It's especially difficult for people who specialize in a particular area 
> of knowledge.  To categorize effectively at its top level requires an 
> ability to grasp the "big picture" of the Wikipedia.
> 
> >2. It's not controllable.
> >
> >You cannot watch a category to get noticed on new articles or when 
> >somebody removes an article from the category. 
> >
> Mostly yes.  Building categories is a top-down activity.  Categorizing 
> articles is a bottom-up activity.  The challenge lies in establishing an 
> interface between these two activities.
> 
> >3. There is no common mission
> >
> >Can anybody tell the purpose of categories? Finding articles (without 
> >a coordinated search function?!) Browsing in topics (without a clear
> >overview of all categories?!) Are we trying to index articles with 
> >subject heading, using a thesaurus, a classification or even a structure 
> >ontology? Library science has invented several kind of schemes like that 
> >but at the moment everybody is muddling this and that trying to invent 
> >the already invented wheels of documentation (by the way there are also
> >methods of automatic indexing, clustering and classification).
> >
> Whatever the mission of categories it is a subordinate mission motivated 
> by a desire to make the information in the projects more accessible.  
> Categories have no meaning in an information vacuum..  I would answer 
> your questions by saying, "All of the above, and more."
> 
> Library science has indeed invented numerous schemes.  Any such scheme 
> designed for general application is as good as its competitors.  Each 
> developped independently to address the priorities of the originating 
> library.  Any of them may thus be validly criticized for its nationalist 
> tendencies.  Nevertheless, choosing one of them to serve as a starting 
> point need not be a nationalist act.  That choice is more likely to be 
> driven by the availability of detailed data, and the willingness of some 
> individual(s) to do the work of adapting that system to serve wiki purposes.
> 
> The muddling and the re-invention of the wheel implicit in most people's 
> approach to categorization was completely forseeable.  I say this 
> without finding fault.  It was just one of those miseries that had to be 
> gone through; system convergence comes later.  Let's just keep away from 
> automated system until we know what we want.  A premature application of 
> automation will only support the muddling.
> 
> >And: In classification there is no NPOV because there is no "right" way 
> >to classify the world but it depends on the special needs and questions 
> >I want to answer with a special system of subject indexing.
> >
> I agree with what you seem to be saying but I would not put it in terms 
> of NPOV.  "Wiki is not paper," is a far more useful principle.  A 
> traditional librarian may want to classify a single copy of a book on 
> German libraries, and must decide whether that book should be shelved 
> with books about Germany or books about libraries.  We do not have that 
> restriction.
> 
> >Given the reasons I strongly recommend to stop using the categories and
> >to focus on writing and improving good articles. Many categories can easily
> >be replaced with normal links between articles. Adding and removing 
> >categories do not change an article's content a bit. If you want to 
> >keep track of all articles in some area use (Wiki)Projects, article 
> >series, portals and learn how to use the "what links here"-function! 
> >A good article is an article that can be found easily without categories.
> >
> I don't arrive at the same conclusion about stopping the use of 
> categories.  The techniques that you mention are all good and effective, 
> and they should obviously continue to be used.  Categories are a way of 
> providing a comprehensive overview.  It is easy to see at an appropriate 
> place on an article just how that article has been categorized, or 
> indeed '''if''' it has been categorized.  A list system has its uses, 
> but needs to be manually maintained.  It is not evident on the face of 
> the article that it has been properly listed.  A non-contributing reader 
> may not be aware of the list's existence, or of the purpose of "What 
> links here."  Even a contributor is not going to be inclined to check 
> every article to see if it has been properly listed, but without 
> checking he has no way of knowing.  This brings me back to my earlier 
> point about categorization of articles being a bottom-up procedure.
> 
> >Indeed classifying wikipedia articles is very interesting and will 
> >become more important, but this should be an independent project - maybe 
> >in a "Classifipedia" or "Categorypedia" that links to wikipedia articles.
> >
> No.  the categories are meaningless in isolation.
> 
> >You know - librarians normally do not write the books they organize and 
> >search engine experts do not write the websites they crawl, so let's focus
> >on what we can do the best: creating the most detailed, most understandable
> >and freest encyclopedia in the history of mankind!
> >  
> >
> Your premise here is the most important thing that you say.  No 
> professional librarian would tolerate an author who goes around 
> insisting that his books be classified in a particular way.  The 
> authors, the editors and the classifiers all have their own roles on the 
> wiki.  All are working toward the goal that you specify, but not in 
> complete isolation.
> 
> Ec
> 
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l at Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikipedia-l/attachments/20040620/9baf53af/attachment.htm 


More information about the Wikipedia-l mailing list