On 11/9/07, Simetrical <Simetrical+wikilist(a)gmail.com> wrote:
Certainly, unless your definition of "sane" is very narrow. I believe
neither C++ nor Perl have LALR(1) grammars. I saw at least one syntax
suggestion for Python one time that was rejected on the basis of
requiring multi-token lookahead.
Ok, it's been a while since I've studied this.
Certainly apostrophes require more than one character lookahead, and
backtracking.
Yeah. Apostrophes again. Saying apostrophes "require" special
treatment is like saying Paris Hilton "requires" special treatment.
ISBN 123456789X is parsed as an ISBN. ISBN 123456789 is not, because
it doesn't have enough digits. That means you
need quite a lot of
lookahead and backtracking for ISBNs, at least in the tokenizer.
Which was my point, the tokenizer will need to be able to backtrack.
It's not a big issue, I don't think, judging by the flex docs, which
was the reason for my post: responding to Steve Sanbeg's remark about
how much lookahead is needed by the tokenizer.
Ok, there you're assuming that if the sequence of digits doesn't match an
ISBN, then you want to reparse it as something else entirely. IMHO it's
better to just parse it as an invalid ISBN. And if someone is really unhappy
that the string
"ISBN 23415[[link please]]" wasn't rendered as a link, then they can
wrap the relevant bit with a <nowiki>
*grumble* I still think recognising "ISBN xxx" is a bad idea.*grumble*
My point in reply to yours is that although it may be feasible to backtrack,
it might be a good idea to avoid it anyway, for the sake of simplicity in
coding and user experience.
Steve