Re: [Wikitech-l] An alternate parser

13 Aug 2004

On Friday 13 August 2004 20:59, Brion Vibber wrote:
...
  Magnus Manske wrote:
  I therefore suggest a new structure:
 1. Preprocessor
 2. Wiki markup to XML
 3. XML to (X)HTML 
 This doesn't actually solve any of the issues with the current parser,
 since it merely has it produce a different output format.

 The main problems are that we have a mess of regexps that stomp on each
 other all the time. 
Are you kidding? That is exactly what it would solve! If you would let the 
preprocessor be generated with a lex/yacc type of tool then you would for the 
first time have a decent formal documentation of the wiki-syntax in the form 
of a context-free grammar. That not only would give you a better idea of what 
the wiki-syntax exactly is and tell you exactly whether any new mark-up 
interferes with old mark-up, but you could also more easily add 
context-sensitive rules (like replacing 2 dashes with &mdash; but only in 
normal text). Moreover it would give you the power to make small changes to 
the mark-up language because you could easily generate a parser that 
translates all old texts to the new mark-up. Finally, having an explicit 
grammar also makes it more easy to make sure that you actually generate 
well-formed and valid XHTML, or anything else that you would like to generate 
from it and that needs somehow to satisfy a certain syntax.

It's simply a briliant idea, and frankly I think it is in the long run as 
unavoidable as the step to a database-backend. If there is performance 
problem you could even consider storing the XML in the database so you only 
need do the raw parse at write time and the xml parse at read time.

That hard part is of course to come up with the contex-free grammar (it should 
probably be LALR(1) at that). Since I used to teach compiler theory I might 
be of some help there.

-- Jan Hidders

PS. You could even get rid of the OCaml code since the Latex parsing could be 
integrated in the general parser.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] An alternate parser