Re: [Wikitech-l] EBNF grammar project status?

9 Nov 2007

On Fri, 09 Nov 2007 15:24:10 +1100, Steve Bennett wrote:

...
  On 11/9/07, Simetrical
 &lt;Simetrical+wikilist(a)gmail.com&gt; wrote:

 According to flex documentation, it's perfectly happy to accept any
 regex for tokens, and will use unlimited lookahead and backtracking if
 necessary.  It provides debug info allowing you to check for and
 eliminate backtracking, if you want to speed it up, but that's optional.
  Clearly it's not possible to tokenize MW markup with one-character
 lookahead: you sure can't tell the difference between a second- and
 sixth-level heading, and of course that's even ignoring  

 Yes you can, if ====== is a token. Which at first glance, it should be.
 The fact that == looks like === looks like ==== is neither here nor there
 to the grammar - it's a handy mnemonic for humans, that's all.

Well, that's exactly the point. At first glance, === is obviously a token,
which will perfectly handle 99% of the headings out there.  But if we want
a complete grammar, we really need sane handling for the last 1%.

To get those into one token would require the tokenizer to do a bit of
parsing to match things up; however, if the tokenizer just determines that
it is a token, and passes a value to the parser, so the parser can deal
with the values, that would probably be a cleaner implementation.

I'm not sure if there's a notation for values in EBNF, so to invent one
for this example, treating

===head==

as "==" TEXT("=head") "==" would be nice, but tricky.
as "="(3) TEXT("head") "="(2) would make for a cleaner
lexer, and the
parser should be able to handle that without too much trouble.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] EBNF grammar project status?