Hello,
As we know, wiki (mainly wikipedia) articles go into a lot of details about
the subject. They often tend to become verbose. Sometimes individual
sections become as long as articles.
The information about a topic is split across various pages which are
linked in the article.We have to open several such links to get a good
understanding of the article.
Navigation popups/Hovercards make it a bit simpler. But the info provided
by them is often out of context .They are more about an introduction to the
linked article rather than the intended page and their connection; which
makes it disconnected and muddled. It helps a reader figure out the
importance of a page, but not its relevancy.
As part of GSoC project, I was thinking of making a summarization tool that
could automatically create a wholesome summary of the article. The links,
categories, infoboxes and other unique wiki things make it much different
and interesting than simple text summarization. It makes it easier to gauge
the context and relevancy of articles and the linked structure make it
possible to crawl to relevant pages (like Hovercard). Finally, combining
only the important and relevant information (from all sections), we can
form a coherent and lucid summary for the reader. The intro paragraphs just
provide an introduction to the article whereas the script will provide a
jist of the entire article (and hence would be bigger in most cases)
Though there has been some independent research
<http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09-yesr.pdf>
done on it, the possibility of such a tool was never discussed at length on
wikimedia.
So, I want to ask the opinion of all the members towards such a tool, in
the above or some other form. Also does it seem like something that can be
done as a GSoC project (MVP)? Would there be any mentors interested?