On Friday 13 August 2004 20:59, Brion Vibber wrote:
Magnus Manske wrote:
I therefore suggest a new structure:
1. Preprocessor
2. Wiki markup to XML
3. XML to (X)HTML
This doesn't actually solve any of the issues with the current parser,
since it merely has it produce a different output format.
The main problems are that we have a mess of regexps that stomp on each
other all the time.
Are you kidding? That is exactly what it would solve! If you would let the
preprocessor be generated with a lex/yacc type of tool then you would for the
first time have a decent formal documentation of the wiki-syntax in the form
of a context-free grammar. That not only would give you a better idea of what
the wiki-syntax exactly is and tell you exactly whether any new mark-up
interferes with old mark-up, but you could also more easily add
context-sensitive rules (like replacing 2 dashes with — but only in
normal text). Moreover it would give you the power to make small changes to
the mark-up language because you could easily generate a parser that
translates all old texts to the new mark-up. Finally, having an explicit
grammar also makes it more easy to make sure that you actually generate
well-formed and valid XHTML, or anything else that you would like to generate
from it and that needs somehow to satisfy a certain syntax.
It's simply a briliant idea, and frankly I think it is in the long run as
unavoidable as the step to a database-backend. If there is performance
problem you could even consider storing the XML in the database so you only
need do the raw parse at write time and the xml parse at read time.
That hard part is of course to come up with the contex-free grammar (it should
probably be LALR(1) at that). Since I used to teach compiler theory I might
be of some help there.
-- Jan Hidders
PS. You could even get rid of the OCaml code since the Latex parsing could be
integrated in the general parser.