Re: [Wikipedia-l] convert wiki markup to plain text

30 Mar 2010

Francis Tyers wrote:
...
  Actually it is surprisingly difficult. I have a script
which goes it
 here:

https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-lex-le…

 Which really needs to be redone for each Wikipedia. If you ask 

 http://en.wikipedia.org/wiki/User:Tresoldi#Wikipedia_as_a_corpus

 He has some scripts which do it too. But there is no generic "nice" way
 of getting Wikipedia as a nice plain text corpus so far. If anyone has
 one I would love to hear about it. 
Convert to html using mediawiki, then filter out all html tags.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] convert wiki markup to plain text