Library for extracting plain text - Wikitech-l

30 Jan 2008

Hi all.

I'm adding some tweaks to the WikiXRay parser of meta-history dumps. I now extract
internal, external links, and so on, but I'd also like to extract the plain text
(without HTML code and, possibly, also filtering wiki tags).

Does anyone nows a python library to do that? I believe there should be something out
there, as there exist bots and crawlers automating the data extraction process from one
wiki to other.

Thanks in advance for your comments.

Felipe.

---------------------------------

¿Con Mascota por primera vez? - Sé un mejor Amigo
Entra en Yahoo! Respuestas.