On Fri, 19 Nov 2004 00:11:35 +0000, Rowan Collins
<rowan.collins(a)gmail.com> wrote:
It's certainly an interesting idea. (And as a
graduate in artificial
intelligence, I'm personally intrigued as to what kind of classifier
is under the hood, but that would be off-topic, so we'll leave that
for another time).
Apart from a simple textual analysis (an article with the word
"molecules" is quite likely to be about chemistry, an article with the
word "wolf" much less), the existing wiki-links (both forward and
backward) give much hints - if a page links to or is linked from
several pages already in my set, that increases the chance it also is.
I would therefore suggest it would be better for you
to run the
software yourself (using a database dump from
http://download.wikimedia.org/) and then to publish the results to the
wiki(s) in question somehow. Perhaps you could write and run a bot (or
collaborate with someone who wanted to) which could add the automated
suggestions to relevant talk pages: "The following may belong here..."
to Category_talk: pages, and/or "This may belong to the following
categories..." to the Talk: pages of categories. Or even, if you felt
particularly dedicated, a bot that helped you add the [[Category:...]]
tag yourself where it was indeed appropriate.
This last one seems like the way to go - a bot that asks you one page
after one another a yes/no question whether to include it. This can be
done quite fast, much faster than peopling a category by hand, and the
answers that have already been given can of course be used in a type
of learning algorithm to be even better in sorting out later
candidates (although doing so would probably much slow down the bot,
so perhaps it's better to get a start on the 'include' and 'exclude'
categories by hand, then go offline to make the estimates, and after
that do the real work in the abovementioned way).
In fact I have already made such a bot, using the simple algorithm of
asking all pages linked to or from a page already included to create a
list. However, this was before Categories were implemented, and I have
not gone to reprogram it for that purpose yet.
Andre Engels