Re: [Wikitech-l] EBNF grammar project status?

9 Nov 2007

On Thu, 08 Nov 2007 20:26:33 -0500, Simetrical wrote:

...
  On 11/8/07, Steve Sanbeg &lt;ssanbeg(a)ask.com&gt;
wrote:
  I think that's true, if you tokenize
correctly, that would go a long
 way. Unfortunately, there are a few constructs that make tokenization
 tricky. Apostrophe is the most obvious case; but {'s, and to a lesser
 extent ['s could have similar problems, since they would require
 substantial lookahead to tokenize.  
 According to flex documentation, it's perfectly happy to accept any regex
 for tokens, and will use unlimited lookahead and backtracking if
 necessary.  It provides debug info allowing you to check for and eliminate
 backtracking, if you want to speed it up, but that's optional.  Clearly
 it's not possible to tokenize MW markup with one-character lookahead: you
 sure can't tell the difference between a second- and sixth-level heading,
 and of course that's even ignoring stuff like ISBN handling that's less
 basic and more disposable. 
But some constructs in MW require an FSM to tokenize, not
a regex.  Clearly, properly tokenizing bold/italics requires complex
processing on an entire paragraph of text.  Even templates and links are a
little complex, but should be doable by maintaining states with a stack.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] EBNF grammar project status?