[WikiEN-l] Category analysis with perl

Anthony DiPierro wikilegal at inbox.org
Mon Jun 5 10:45:32 UTC 2006


On 6/5/06, Steve Bennett <stevagewp at gmail.com> wrote:
> On 6/5/06, Anthony DiPierro <wikilegal at inbox.org> wrote:
> > You could always put "See also: [[:Category:The Beatles]]" (I think
> > that's the syntax) in the  description for [[Category:British rock
> > bands]].
>
> That's probably not bad. The page would have a basic structure like:
>
> Description
> Related categories <-- new
> Subcategories
> Articles in this category
>
> People *want* to put "related categories", but they break the category
> system if they make them subcats. We need to channel that desire.
>
> Steve

While playing around with perl and [[Category:Airports]] last night I
noticed an instance of what will probably be another common one:
[[Category:Lists of airports]] is a subcategory.  There are probably
enough instances of that sort of thing that we would have to make an
exception (in the interest of consensus), but the cleaner solution
would be to go with something like the related categories idea above.

I should also note that [[Category:Airports]] itself is treated like a
theme for articles, but the subcategories are treated like attributes.
 (Actually, now that I look at it directly on Wikipedia I see "Airport
lounges" and "Airport operators" are also subcategories.  I didn't
notice that in my original, very fast, skim of the tree, but looking
back it *was* there).

If anyone wants to take a look at my "tree" for airports, or my perl
script (which is really simplistic), let me know where I can upload
it.  To run it you need to download and import two mysql database
files, enwiki-20060518-categorylinks.sql (290 megs) and
enwiki-20060518-page.sql (354 megs).  The "tree" looks like this:

Airports_by_country
*Airports_in_Afghanistan
*#Bagram_Air_Base
*#Kabul_International_Airport
*Airports_in_Albania
*#Rinas_Mother_Teresa_Airport
[...]
*Airports_in_Australia
**Royal_Australian_Air_Force_bases
***Former_RAAF_Bases
***#RAAF_Station_Archerfield
***#RAAF_Station_Bairnsdale
***#RAAF_Base_Rathmines
***#RAAF_Station_Tocumwal
**#RAAF_Base_Amberley
**#RAAF_Bare_Bases

Are air force bases airports?  I'd say so.

I also tried making a tree for [[Category:Buildings and structures]]
(a "parent" of airports).  That one grew quite messy, and my script
bailed at 10 levels of recursion, so I haven't really analysed it
much.  I'll try turning it up to 20 or 25 and see what happens.

Anthony



More information about the WikiEN-l mailing list