[WikiEN-l] Types of categories

Ray Saintonge saintonge at telus.net
Wed Jun 7 17:48:39 UTC 2006


Anthony DiPierro wrote:

>On 6/3/06, Roger Luethi <collector at hellgate.ch> wrote:
>  
>
>>On Sat, 03 Jun 2006 19:54:27 +0200, Steve Bennett wrote:
>>    
>>
>>>I'm probably not the only one who envisages all the wonderful things
>>>that could be done with this massive collection of information that is
>>>Wikipedia, *if only* we could do something clever with the categories.
>>>And then you realise that you can't really do anything clever because
>>>"category" has all sorts of different meanings to different people.
>>>      
>>>
>>Agreed. Still: can you give some specific examples of wonderful things that
>>could be done but are not possible now? That would tell us what problem you
>>are trying to solve.
>>    
>>
>I've personally run into this when trying to automatically create, for
>example, a list of all Wikipedia articles on people.  You can't just
>start at [[category:people]] and work your way down, because you wind
>up going to [[Category:Women]] (fine, all women are people) then
>[[Category:Feminine hygene]] (bad).
>
A high level category like people does not need to have direct 
elements.  To be simplistic about it, it would do fine with only two 
sub-categories, men and women.  An element of a sub-category is an 
element of the superset category.

>>Categories based on such intersections of attributes are conceptually bad.
>>Look at the categories for an article like [[Marie Curie]]: She's French
>>three times, female four times, Polish four times (not counting "Natives of
>>Warsaw"), etc. Why not create [[Category:Polish women who were born in
>>1867 and died in 1934 and won a Nobel Prize in Chemistry and in Physics]]?
>>    
>>
>Because there would only be one person in that category.
>
Such a category would be theoretically acceptable but totally 
impractical. There is an element of art to the design of category 
hierarchies.  A category that's too narrow (like your example) is 
unfindable; you simply never know which ones exist.  At the other 
extreme, if the category is too broad it becomes more difficult to find 
things within it.  In Wiktionary people have established 
[[Category:English nouns]] which now has numerous elements, but what 
user would ever look there to find something?  The purpose of categories 
is to help the passive user to find things.  It requires some idea of 
which Googling strategies work and which don't, and how to modify a 
strategy which initially doesn't work.  Just think of what works when 
you are searching for something.

In my mind a category should not have more than 200 direct elements, 
this being the number of items that will appear on a single page by 
default when we ask for a category to be listed.  Anything longer should 
be subdivided.  Even so, a person should have the option to have an 
"include sub-categories" to a determinable level when listing the 
contents of a higher level category.

>>If we don't have a term for (or an article about) it, there probably
>>shouldn't be a category for it, either (I'm sure a determined mind could
>>come up with an exception).
>>    
>>
>If the category system could effectively build these intersection
>categories on the fly, I'd agree.  But the category system can't
>currently do that.  (And it's been around a reasonably long time, with
>that as an obvious flaw, and no one has fixed it.)
>
I suggested something of the sort before categories were implemented, 
but more from the searching end.  The real problem is with the search 
function, which is remarkably unsophisticated for a project the size of 
Wikipedia.

>>>Attributes: The category exists to denote some very specific small
>>>detail of a subject, such that it would be conceivable to have dozens
>>>or more such categories on an article. Examples: 1943 deaths, Living
>>>persons, Winners of Nobel Peace Prize, etc. These tend to hierarchies
>>>that start strict then end up fuzzy. Eg, 1943 deaths is only in 1943
>>>and "1940s deaths", and these have parent categories of
>>>"1940s","Years" and so forth, eventually ending up in "History",
>>>whereupon things become chaos.
>>>      
>>>
>>There is no way to make hierarchies not suck, especially if you have to
>>maintain them manually (as we do now). Don't try to impose hierarchies
>>unless they emerge quite naturally from the subject.
>>    
>>
>I made a proposal.  All subcategories of attributes must be a subset
>of the parent attribute.  Seems like a perfectly reasonable way to
>make hierarchies not suck.
>
It's an idea that I have tried to implement for some time at 
Wiktionary.  The difficulty with such hierarchies is that they require 
people to think logically, and to be able to trace a path back to a 
single top level hierarchy.

Ec




More information about the WikiEN-l mailing list