Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser

30 Aug 2004

I already got it in your reply here:

Maybe you are underestimating the vast differences in implementation
between the current not-really-a-parser and what I am working on.

There is nothing wrong with using a group of templates together, but
there *is* something majorly wrong with patching together one object (a
table, in this case) using pieces from different places. It works with
the current not-really-a-parser because it takes the wiki source texts
from the templates, sticks them together somehow, and then converts them
to HTML. This kind of practice is exactly what leads to all the problems
with our current not-really-a-parser. A proper parser should parse each
template individually, and then use its parse tree in the processing of
the page that uses it.

It's great that you're working on a different way to do it thats not
just dumb-text-includes.

On Mon, 30 Aug 2004 16:55:01 +0100, Timwi &lt;timwi(a)gmx.net&gt; wrote:
...
  Ævar Arnfjörð Bjarmason wrote:

  Why would it ever break? I can see it getting
slow because it cannot
 be optimized but not breaking, all it's doing is just including one
 thing after the other

 {{a}} gets Template:A which contains "foo" and {{b}} gets Template:B
 which contains "bar" hence

 {{a}}{{b}} = foobar  
 Of course, this simple example would still work. But picture this:

 Template:A contains:         I ''li
 Template:B contains:         ke'' hamburgers

 currently, {{a}}{{b}} would yield "I <em>like</em> hamburgers", but
only
 because it sticks the pieces together and then tries to make sense of it.

 Why is this bad? Picture this:

 Template:A contains:
         {|
         | nowrap
 Template:B contains:
         | Text
         |}

 Is the "nowrap" a table cell attribute or text in a separate cell? Does
 this change depending on whether there is a newline after "nowrap"? ...
 And this is just a simple example.

  Why would this break in whatever parser you plan
to implement?  
 Because a parser is not a converter. The current not-really-a-parser is
 actually a converter: It looks out for particular syntax elements like
 ''these'' and turns them into <em>HTML tags</em>. This is bad
because it
 means that several of these conversions can interfere with each other:

         I ''like [[hamburger|hamburgers'']]

 produces invalid HTML. It gets even worse when it tries to locate
 {{template inclusions}} and replaces them with some other text, not
 knowing what it is or how it fits into the document structure.

 A real parser analyses the document's structure. It turns the wiki text
 into a data structure in memory that actually bears resemblance to the
 structure of the document. It creates a "heading" element where there is
 a heading, instead of turning some strategically-placed equals signs
 into <h#> tags.

  The only reason i can see why that would happen
is if you were to
 implement some auto-completion of the table syntax. Sort of like
 tidy(html) for wikisyntax and do it before things get fetched from
 Template: rather than after everything has been included.  
 Your terminology "auto-completion" reveals that you are thinking in
 terms of conversion. Don't think of it as auto-completion; for example,
 if a '' has no matching '', I can tell the parser what to do
 independently of what it does when there *is* a matching ''. There are
 several possibilities: make an italics element (what you would probably
 call auto-completion); make a text element (i.e. pretend the "''" was
 actually text); or bail out saying "syntax error". Of course, we don't
 want the latter. My parser currently does the second: It turns the ''
 into text. I did that because this is also how the current
 not-really-a-parser functions. However, I can easily change that.

 In our specific case, there would be a document (a template) that has a
 {| with no matching |}. What should it do? Unfortunately, none of the
 three options make it work the way you have come to expect from the
 current not-really-a-parser.

 Timwi

 _______________________________________________
 WikiEN-l mailing list
 WikiEN-l(a)Wikipedia.org
 http://mail.wikipedia.org/mailman/listinfo/wikien-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser