On 11/14/07, Virgil Ierubino <virgil.ierubino(a)gmail.com> wrote:
I'm assuming our problem is this: currently we "parse" wikitext by
immediately converting, via regex, into XHTML. This is not really
"parsing",
because parsing usually means the creation of an abstract Document Object
Model which is then iterated through to generate XHTML, XML, FooBar or
whatever (or so I have learnt). Because we're missing this DOM, Wikitext
can't expand beyond being used by the current parser (so we can't do
WYSIWYG, etc.). However, there appears to be no way of creating a DOM from
Wikitext because this would be to standardise the way syntax converts to
output, but any kind of standardisation will cause backwards
incompatibility.
Your "DOM" is usually called an AST ("abstract syntax tree"). But
yes.
However, "backwards incompatibility" is
not so much the issue as "sudden, drastic misrendering of existing wikitext".
I do think it's impossible to produce a meaningful traditional parser
that could replicate exactly the
behaviour of the current parser. I think it's very possible to produce
a good parser that will cover
all the most useful cases.
So our problem is the dilemma: either we standardise, and lose backwards
compatibility, or we don't, and lose
extensibility. And in the long run, I
think the first option is better. However, in standardising the language
we'd lose the feature of it that all syntax is valid (useful, as then
people
can't ever be presented with error messages on their pages) so ideally the
The "all syntax is valid" thing really arises because of the nature of
browsers rather than
because of the parser itself. I don't think we're in danger of losing
that - the parser will just
have to fail gracefully when it comes up against malformed wikitext.
On the point of immutable validity, it is perhaps less
useful for all text
to be valid than for there to be "invalid markup" error messages. Whilst
the
former ensures users can never really "go wrong", it is still true that
bad
markup will lead to results they very much didn't intend - and it seems to
me more useful to give them an error message than a wildly unintended
result.
Wildly unintended is fine, at least they see that (or someone else does).
What's more dangerous is when stuff silently breaks, making a sentence or
two just disappear off the page.
Steve