[WikiEN-l] Types of categories

Anthony DiPierro wikilegal at inbox.org
Sun Jun 4 15:31:02 UTC 2006


On 6/4/06, Roger Luethi <collector at hellgate.ch> wrote:
> On Sat, 03 Jun 2006 17:27:59 -0400, Anthony DiPierro wrote:
> > > Categories based on such intersections of attributes are conceptually bad.
> > > Look at the categories for an article like [[Marie Curie]]: She's French
> > > three times, female four times, Polish four times (not counting "Natives of
> > > Warsaw"), etc. Why not create [[Category:Polish women who were born in
> > > 1867 and died in 1934 and won a Nobel Prize in Chemistry and in Physics]]?
> >
> > Because there would only be one person in that category.
>
> That's why nobody made it, but not why it shouldn't be done.
>
I'd say it's both.  There shouldn't be categories with only one
article in them.  IMO that's just common sense.

In the current system categories should have a fair number of articles
in them.  If there are too many, they should be broken up.  If there
are too few, they should be combined.  There isn't a crystal clear
line what constitutes too many and what constitutes too few, but a
category with only one article in it clearly has too few.

The problem of categories having too many articles in them wouldn't
really be a problem if the software allowed you to automatically
compute category intersections.  But the software doesn't do this, so
people make do with what they've got.

> It would be nigh impossible to do well because once we start combining
> attributes to create new categories, we are looking at maintaining links
> between articles and an exploding number of subcategories.
>
> But even if we maintained a complete and up-to-date system of subcats, we'd
> still make it hard for people to find articles using categories. For some
> fairly sensible reasons, the rule is to include articles only to the
> subcategory, but not to the parent. There is no way to list articles based
> on a subset of criteria (the articles in subcategories are effectively
> hidden on separate pages which is only helpful if you know which one to
> pick).
>
> > If the category system could effectively build these intersection
> > categories on the fly, I'd agree.  But the category system can't
> > currently do that.  (And it's been around a reasonably long time, with
> > that as an obvious flaw, and no one has fixed it.)
>
> You are right, we can't effectively build these intersection categories on
> the fly at the moment, but we _could_ automatically create or update such
> intersection categories if the categories weren't the mess that Steve and
> you describe. Kind of like the search index.
>
You're right.  And that's what my simple rule that "All subcategories
of attributes must be a subset of the parent attribute" is meant to
address.  If that were the case, it would be possible to automatically
recursively descend a parent category to find *all* the articles to
which it applies.  And then computing the intersection of any two
parent categories would be possible.  I actually had software which
did this, but it doesn't work right because the subcategory rule isn't
being followed.

Once the software is written to compute intersections of categories
within the Mediawiki software, it would be relatively simple to
recategorize the articles into their parent categories, such that no
information was lost.  The way this would be done is that all articles
in a subcategory which had multiple parent attribute categories would
be automatically moved into the parent categories.  This would be
repeated until no such situations continued to exist.  The ad-hoc
structure could still be kept, but it could be calculated on the fly
(along with new types of intersections which could be easily added).

(Now that I do this on an example, I see that this algorithm would
probably have to be tweaked to deal with subcatgories of
[[Category:Categories by topic]], but that's not too bad.)

> > > > Attributes: The category exists to denote some very specific small
> > > > detail of a subject, such that it would be conceivable to have dozens
> > > > or more such categories on an article. Examples: 1943 deaths, Living
> > > > persons, Winners of Nobel Peace Prize, etc. These tend to hierarchies
> > > > that start strict then end up fuzzy. Eg, 1943 deaths is only in 1943
> > > > and "1940s deaths", and these have parent categories of
> > > > "1940s","Years" and so forth, eventually ending up in "History",
> > > > whereupon things become chaos.
> > >
> > > There is no way to make hierarchies not suck, especially if you have to
> > > maintain them manually (as we do now). Don't try to impose hierarchies
> > > unless they emerge quite naturally from the subject.
> > >
> > I made a proposal.  All subcategories of attributes must be a subset
> > of the parent attribute.  Seems like a perfectly reasonable way to
> > make hierarchies not suck.
>
> The devil is in the details.
>
> For instance, how do you connect the districts of Paris to the category
> Paris? What is a subset of the parent attribute "Paris": "Districts of
> Paris", or "Quartier Latin", or neither? Does it bother you if the article
> on a French district is now in a subcategory of "Capitals in Europe"?
>
[[Category:Paris]] is a theme, not an attribute, so [[Category:Paris]]
should not be a subcategory of [[Category:Capitals in Europe]].

> Or going back to [[Category:Women]]: You could declare that only articles
> on instances of women (i.e.  biographies) can ever be under that category,
> and that only sets of such articles can ever be subcategories of the
> category women. -- You could even create a separate [[Category:Woman]],
> subcategories like "female reproductive organs" containing articles like
> uterus. -- But how would you express the undisputed relationship between
> female human beings and your example [[Category:Feminine hygiene]]? How
> about [[Category:Women's rights]]? Add an umbrella cat "Somehow related to
> women" maybe?
>
> Roger

[[Category:Women]] could be a subcategory of [[Category:Woman]].
Making an attribute a subcategory of a theme is allowed, it is the
reverse that is not allowed.

In any event, things wouldn't be perfect.  Ultimately the best
solution would involve fixing the category system itself, a process
which should be approached carefully so as to avoid making the same
mistakes all over again.  The advantage of my proposal to not allow
themes as subcategories of attributes is that it can be implemented
today, without much disruption, and without modifying any code.  Plus,
it allows for a relatively straightforward upgrade path when the
category system is fixed.  The proposal itself is not the fix, it's a
temporary workaround.

As an alternative, it would probably be possible to do all of this
even without enforcing the subcategory rule.  But all purely attribute
categories would have to be identified as such.  I'll have to think
about that.

Anthony



More information about the WikiEN-l mailing list