On Mon, Aug 11, 2003 at 04:12:08PM +0200, Tomasz
Wegrzanowski wrote:
Here's one that uses lex + single pass over
tokens, and can generate
valid XHTML. It's proof-of-concept program, not a drop-in replacement
for current parser.
=== How it works ===
Wiki syntax is line based and uses state-transition model -
a token chances state from anything to X. HTML is free-form
and states can nest.
This parser maintains stack of "inline" elements. Every time it
finds </X>, it checks if <X> is on stack, and if it is, it pops and closes
every element till it gets to <X>, otherwise it prints raw </X>.
If it finds <X>, it checks if it conflicts with something on stack,
and acts accordingly.
When "paragraph state" has to change, it closes all open inline tags.
It doesn't preserve whitespace unless necessary (<pre> and wiki pre,
for now also <nowiki>).
Isn't this the wrong aproach? Shouldn't we try to get rid of HTML
in articles in the first place? Doesn't the necessary fixing
prove that HTML is not what the average contributor can be
expected to use?
If you suplemented Wiki syntax with markup for everything we're doing
with HTML now, you'd get something that's not significantly easier.