Re: [Wikitech-l] Wikitext to HTML translator and Wikitext language specification

21 Oct 2003

On Tue, Oct 14, 2003 at 11:16:19AM +1300, Richard Grevers wrote:
...
  On Mon, 13 Oct 2003 17:21:15 -0400, David Friedland
&lt;david(a)nohat.net&gt; gave 
 utterance to the following:

 There seems to be a lot of disjoint discussion on
Meta about this. Viz:

* There is work that has been done by Taw on an OCAML lexer at
   <http://meta.wikipedia.org/wiki/Wikipedia_lexer>  
 My suggestions would be "the broken wikitext language", or the "invalid 
 wikitext language".
 Because of its UseMod ancestry, the current parser produces some very bad 
 HTML code*, and in particular handles lists and nesting of blocks really 
 badly.
 * not so bad if HTML 3.2 or 4 is our target, but it would be nice to be 
 able to produce clean XHTML.
 A few months back I started work on a ValidWiki parser, which has a much 
 stronger concept of block and line elements, and uses both block and line 
 stacks to open and close all elements correctly.
 I think I'm about 2/3 of the way through the block parser, and hadn't yet 
 written the line parser. I have no idea how the code would comapre for 
 efficiency.
 Unfortunately the only language I know how to code in is MivaScript, so it 
 would need porting. (Miva performs okay for your mid-level merchant 
 application, but doesn't have the efficiency for something with the 
 workload of Wikipedia. 
Uhm, my parser has block stack + line stack architecture too.
But the sources at http://meta.wikipedia.org/wiki/Wikipedia_lexer aren't
the most recent.

Newer sources attached.

It's not complete but it wasn't really meant to be.
It was meant to be a proof of concept that a mix of wiki markup and HTML can
be parsed in a XHTML-correct and DWIM way extremely efficiently.
Concept proven, but integrating the parser with the rest of Wikipedia would
take much more time than I'm willing to spend right now.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikitext to HTML translator and Wikitext language specification