[WikiEN-l] What is a category?

Chris Wood standsongrace at hotmail.com
Wed Sep 1 06:11:22 UTC 2004


This is a mini-essay on a current problem in MediaWikiland, category policy.
It's also avaiable at
http://en.wikipedia.org/wiki/User:Gracefool/What_is_a_category, but I'd
rather have it discussed here first.

'''N.B.''' I'm aware of previous discussions at [[Wikipedia
talk:Categorization]]. This essay is a more thorough and defensible
treatment of the issue, and it highlights the fallacies of many previous
arguments.

==Introduction==
What ''is'' a category? No-one knows. There isn't consensus on what a
category is (see [[Wikipedia talk:Categorization]]). Is it a hierarchical
tree, with all categorizations representing "[[is a]]" relationships? Or is
it just a set, a group of related articles?

This is an important question - just look at [[Wikipedia:Categories for
deletion]]. Changes to categories have more widespread effects than changes
to articles, and have a greater possibly of annoying editors.

I believe that categories are, and should be, sets, not hierarchies:

==Categories are sets==
===Original purpose of categories===
What was the original purpose of the categorization system? Development of a
taxonomy of worldy knowledge? I don't think the developers are really that
stupid (I'll expand on this below). AFAIK it was as a kind of automatic
list-generator for related articles. Lists are sets, not hierarchies. Lists
of "related articles" are sets, not hierarchies.

===Current software===
The way that categories have been developed in software supports the idea
that categories are sets. There is implicit support for categories as sets
because there is nothing to stop anyone from using them that way. None of
the limits of a hierarchical system exist in the category software. Such
software is the best way to enforce the idea of hierarchical categories, and
would be easy to implement (eg. don't allow arbitrary parenting of
categories).

Until policy is decided on (and, preferably, software upgraded to support
it), categories will continue to be used as sets. Since sets include
hierarchies, while hierarchies don't include sets, the current
categorization system is one of sets.

==Categories should be sets==
===Categories are inherently POV===
A categorization system is a worldview. Therefore it is very hard for
categories to be [[WP:NPOV|NPOV]]. The following quote from
[http://www.shirky.com/writings/semantic_syllogism.html#worldviews_differ_fo
r_good_reasons Clay Shirky] expands:

<quote>:Many networked projects, including things like business-to-business
markets and Web Services, have started with the unobjectionable hypothesis
that communication would be easier if everyone described things the same
way. From there, it is a short but fatal leap to conclude that a particular
brand of unifying description will therefore be broadly and swiftly adopted
(the "this will work because it would be good if it did" fallacy.)

:Any attempt at a global ontology is doomed to fail, because meta-data
describes a worldview. The designers of the Soviet library's cataloging
system were making an assertion about the world when they made the first
category of books "Works of the classical authors of Marxism-Leninism."
Melvyl Dewey was making an assertion about the world when he lumped all
books about non-Christian religions into a single category, listed last
among books about religion. It is not possible to neatly map these two
systems onto one another, or onto other classification schemes -- they
describe different kinds of worlds.

:Because meta-data describes a worldview, incompatibility is an inevitable
by-product of vigorous argument. It would be relatively easy, for example,
to encode a description of genes in XML, but it would be impossible to get a
universal standard for such a description, because biologists are still
arguing about what a gene actually is. There are several competing standards
for describing genetic information, and the semantic divergence is an
artifact of a real conversation among biologists. You can't get a standard
til you have an agreement, and you can't force an agreement to exist where
none actually does.

:Furthermore, when we see attempts to enforce semantics on human situations,
it ends up debasing the semantics, rather then making the connection more
informative. Social networking services like Friendster and LinkedIn assume
that people will treat links to one another as external signals of deep
association, so that the social mesh as represented by the software will be
an accurate model of the real world. In fact, the concept of friend, or even
the type and depth of connection required to say you know someone, is quite
slippery, and as a result, links between people on Friendster have been
drained of much of their intended meaning. Trying to express implicit and
fuzzy relationships in ways that are explicit and sharp doesn't clarify the
meaning, it destroys it.</quote>

The whole concept of an all-encompassing hierarchical category system is
against the spirit of Wikipedia. It is an all-encompassing worldview, or
attribution of value, to the marked-up (categorized) articles.

The "categories are hierarchies" idea presumes that it is even possible for
a large group of people to agree on an all-encompassing belief-system, a
ridiculous notion totally bereft of realism, a notion that has been shown
wrong experientially in many IT metadata projects.

Categories, especially hierarchical categories, are about the followers of
one particular worldview implicitly saying "our way is right, everyone
should follow it". Note that the proportion of people who follow one
particular worldview in every aspect is very small.

===Sets are much less POV===
Categorization by set is obviously less POV. An article can belong to as
many sets as the community thinks it should belong to, whether directly or
via multiple parenthood of the article's category (or ancestors).

==Conclusion==
The benefits of hierarchical categorization
#decreased redundancy
#easier navigation (for a minority who have the "right" worldview)
are outweighed by its costs
#the community will never be in agreement over the system
#harder navigation (for the majority who don't find articles where they
expect them to be)
#decreased accuracy (the real world is not in a big hierarchy, it merely has
sets of metadata applied to it by different people)

--
Chris Wood






More information about the WikiEN-l mailing list