[Mediawiki-l] Graph Data Structures for Wikis

Gregory Szorc gregory.szorc at gmail.com
Sat Sep 2 00:58:05 UTC 2006


Hey all,

I've created two new extensions that relate to describing the structure of a
wiki.

1) GraphDataStructure - This class allows you to obtain a graph data
structure (nodes and edges) for your wiki.  It looks in the database for all
articles, categories, images, templates, etc and creates a node record for
these.  Next, it finds the relationships between the nodes (page links,
category association, redirects, image and template uses, etc).  Once you
have a set of nodes and edges, you can export the result set as XML.  It is
also possible to set an active node from a Title object and prune all nodes
and edges not connected to this node.  The class can also take the XML it
exports and construct a new instance of the object.

2) SpecialGraphStructure - This is a front-end for the above extension at
[[Special:GraphStructure]].  This special page provides a web interface to
obtain XML for various sets of nodes, articles, etc.  It also has a caching
mechanism for GraphDataStructure.  The way it works is the first time it
loads, the graph data structure for the entire wiki is created (a
resource-intensive operation for large wikis).  The graph data structure is
serialized for future reference.  For subsequent requests for graphs, the
serialized data is read in and the appropriate nodes and edges are extracted
and XML is returned.  The graphs for each query are cached, for optimal
performance.  The cache also cleans itself, for convenience.

It is worth noting that the extension creates a graph structure for your
entire wiki, then scans this structure to extract necessary information.
For this reason, the extension will be significantly slower on large wikis.
I have it working on a wiki with ~4000 nodes, and the speed cost is minimal
(after initial population anyway).

By themselves, these extensions aren't that exciting (unless you are the
kind of person who enjoys a machine-readable format of your wiki structure.
This is why I am working on another extension, SpecialGraphviz, which takes
the XML output from Special::GraphStructure and converts it to Graphviz
markup.  In the prototype of this extension, I have a new tab/action on
articles called "Visualize" which displays a Graphviz rendering of the
article's relationship to other articles.  The rendering is also an image
map, so you can click on the nodes and be taken to the appropriate wiki
page.

These extensions are in my Subversion repository at
http://opensource.case.edu/svn/MediaWikiHacks/extensions/.

You may view a demo of SpecialGraphStructure at
http://wiki.case.edu/Special:GraphStructure

If you have any suggestions or feature requests, just send me an e-mail.

Gregory Szorc
gregory.szorc at gmail.com



More information about the MediaWiki-l mailing list