On 11/13/07, Virgil Ierubino <virgil.ierubino(a)gmail.com> wrote:
It's clear that BNF can't formulate outside these constraints because if
(1)
was false, you'd never get to the end of the specification, and (2) can't
be
false simply because it restates (1) but also allows for languages with
nested syntax. If it were false that the nested/complex syntax could be
broken down into basic syntax, then that complex syntax would simply BE
basic syntax, and you are left with just rule (1).
Wikitext fails these constraints because the construct:
**bullet 1
**bullet 2
which are two successive level 2 bullet points, can't be broken down into
less complex syntax. Therefore the "**" construct for a level 2 bullet
point
must be BASIC syntax, not complex syntax. But seen as Wikitext, just like
XHTML etc., allows for bulleted lists with infinite nesting levels, there
would be an infinite number of basic constructs (for level 2 list items,
level 3, 4, 5, 17, 234, etc). We thus fail constraint number (1).
This was my initial reaction. However, I don't think it's actually
that important. Because in fact this:
**foo
##blah
*is* valid syntax. As is this:
*foo
**blah
#*blah
Which means that each line can be parsed on its own merits, then a
subsequent pass can perform the code generation. This will likely be the
general story for the new parser: a traditional parser model with a couple
of
hacks to cope with the nuances of wikitext, as opposed to a parser built
with hacks from the ground up.
wrong. Our problem is simply that the syntax for adding a bullet at the SAME
level has changed now that we're dealing with
another level of bullets -
at
level 1, another bullet at the same level is "*", but at level 2, another
bullet at the same level is "**" - but this means that each level of
bullets
has its own syntax, meaning each bullet level construct is a basic
construct, and therefore that there are an infinite number of constructs.
I think honestly a list element will just be defined as an arbitrary
sequence of :, # and *,
followed by text. EBNF is incapable of expressing how that sequence should
be rendered, but that's not a showstopper.
"*#*" rather than the expected "***". There is thus no hope of
defining
Wikitext in BNF unless we exhaustively specify every
combination of
There is no hope of *fully defining* Wikitext in BNF...
What this means is that we can't use a basic EBNF parser in all the usual
useful ways. We need new solutions.
What this means is that we can't use *just* a basic EBNF parser. We will
need an EBNF parser with some hacks/tweaks/special cases.
Steve