In the early years 2005-2007, lots of small articles were
created in the Swedish Wikipedia, assuming that they would
all grow with time (eventualism [1]) and the important thing
was to have many articles. Some articles were marked as stubs
or substubs, in a random and entirely subjective way. This
changed in 2008, when we started to count the length of
articles, to identify the shortest ones, and started to
systematically extend or merge them into better articles.
It's now much easier to identify a new article as being
far too short, and to treat this as an urgent problem.
Now, I'm feeling we're having the same problem with
categories. Some people create lots of subcategories
for ever more specialized subdivisions of a topic.
Especially, "companies established in 1974" gets
divided into subcategories for different businesses [2].
Many of the new categories have only very few members.
But we have no principles for understanding whether this
really is a problem and no tools to get an overview.
We have no way to assess the harm of subdividing
a category. If it was only me and my feeling, I would
tell them to stop doing this. But this is not enough
for a consensus decision.
What tools are there to count the number of categories,
sort them by the number of members (subcategories and
articles), and to determine which branches of the
category tree should be pruned?
As for "law firms established in 1974", one can't
remove that single category without reconsidering the
entire "law firms by year of establishment" structure.
So a tool should consider at least 2 levels.
Which languages of Wikipedia have best organized category
trees? I know the German Wikipedia is different. It is
often hard to make interwiki links to its categories, but
most of the rest seem to build on quite similar ideas.
[1]
http://meta.wikimedia.org/wiki/Eventualism
[2]
http://sv.wikipedia.org/wiki/Kategori:F%C3%B6retag_bildade_1974
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se