[WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser

30 Aug 2004

Ævar Arnfjörð Bjarmason wrote:

...
  Why would it ever break? I can see it getting slow
because it cannot
 be optimized but not breaking, all it's doing is just including one
 thing after the other

 {{a}} gets Template:A which contains "foo" and {{b}} gets Template:B
 which contains "bar" hence

 {{a}}{{b}} = foobar 
Of course, this simple example would still work. But picture this:

Template:A contains:         I ''li
Template:B contains:         ke'' hamburgers

currently, {{a}}{{b}} would yield "I <em>like</em> hamburgers", but
only 
because it sticks the pieces together and then tries to make sense of it.

Why is this bad? Picture this:

Template:A contains:
	{|
	| nowrap
Template:B contains:
	| Text
	|}

Is the "nowrap" a table cell attribute or text in a separate cell? Does 
this change depending on whether there is a newline after "nowrap"? ... 
And this is just a simple example.

...
  Why would this break in whatever parser you plan to
implement? 
Because a parser is not a converter. The current not-really-a-parser is 
actually a converter: It looks out for particular syntax elements like 
''these'' and turns them into <em>HTML tags</em>. This is bad
because it 
means that several of these conversions can interfere with each other:

	I ''like [[hamburger|hamburgers'']]

produces invalid HTML. It gets even worse when it tries to locate 
{{template inclusions}} and replaces them with some other text, not 
knowing what it is or how it fits into the document structure.

A real parser analyses the document's structure. It turns the wiki text 
into a data structure in memory that actually bears resemblance to the 
structure of the document. It creates a "heading" element where there is 
a heading, instead of turning some strategically-placed equals signs 
into <h#> tags.

...
  The only reason i can see why that would happen is if
you were to
 implement some auto-completion of the table syntax. Sort of like
 tidy(html) for wikisyntax and do it before things get fetched from
 Template: rather than after everything has been included. 
Your terminology "auto-completion" reveals that you are thinking in 
terms of conversion. Don't think of it as auto-completion; for example, 
if a '' has no matching '', I can tell the parser what to do 
independently of what it does when there *is* a matching ''. There are 
several possibilities: make an italics element (what you would probably 
call auto-completion); make a text element (i.e. pretend the "''" was 
actually text); or bail out saying "syntax error". Of course, we don't 
want the latter. My parser currently does the second: It turns the '' 
into text. I did that because this is also how the current 
not-really-a-parser functions. However, I can easily change that.

In our specific case, there would be a document (a template) that has a 
{| with no matching |}. What should it do? Unfortunately, none of the 
three options make it work the way you have come to expect from the 
current not-really-a-parser.

Timwi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser