Re: [Wikitech-l] A Modest Proposal on grammar and parsers

12 Nov 2007

Hoi,
The English Wikipedia does an provide excellent environment to test the
English language environment. It does not do the same for other languages.
Remember that MediaWiki supports over 250 languages?

We do know that the current parser makes it impossible  to write Neapolitan
words like l'' improve this is because '' gives you italic. In my opinion
the parser should not be in the way of people writing the language as is
usual for that language.

Thanks,
     GerardM

On Nov 12, 2007 6:49 AM, Nick Jenkins &lt;nickpj(a)gmail.com&gt; wrote:

...
   On 11/11/07,
Steve Bennett &lt;stevagewp(a)gmail.com&gt; wrote:
 > I'm hoping we might be able to sell it off the plan:
 >
 > "If we implement a parser that renders 99% of the current corpus of 
wikitext
  > correctly, and we come up with
 > a reasonable process for rolling it out without too much disruption,
 > would you let us do it?" 
 If you want a list of hypothetical acceptance requirements, then I would
 add:
 * Should render 99% of the articles in the English Wikipedia(*)
 identically to the
  current parser.
 * For the 1% that doesn't render the same, provide a list of what
 constructs don't
  render the same, and an explanation of whether support for that construct
 is
  planned to be added, or whether you think it should not be supported
 because it's
  a corner-case or badly-thought-out construct, or something else.
 * Should have a total runtime for rendering the entire English Wikipedia
 equal
  to or better than the total render time with the current parser.
 * Should be implemented in the same language (i.e. PHP) so that any
 comparisons
  are comparing-apples-with-applies, and so that it can run on the current
 installed
  base of servers as-is. Having other implementations in other languages is
 fine
  (e.g. you could have a super-fast version in C too) just provide one in
 PHP that can
  be directly compared with the current parser for performance and
  backwards-compatibility.
 * Should have a worst-case render time no worse than 2x slower on any
 given input.
 * Should use as much run-time memory as the current parser or less on
 average, and
  no more than 2x more in the worst case.
 * Any source code should be documented. The grammar used should be
 documented.
  (since this is relates to the core driving reason for implementing a new
 parser).
 * When running parserTests should introduce a net total of no more than
 (say) 2
  regressions (e.g. if you break 5 parser tests, then you have to fix 3 or
 more
  parser tests that are currently broken).

 (*) = I'm using the English Wikipedia here as a test corpus as it's a
 large enough
 body of work, written by enough people, that it's statistically useful
 when comparing
 average and worst-case performance and compatibility of wiki text as used
 by people
 in the real world. Any other large body of human-generated wikitext of
 equivalent
 size, with an equivalent number of authors, would do equally well for
 comparison
 purposes.

   I guess
the answer would be yes. 
 I'm guessing it would be "Sure, maybe, let's see the code first."  One
 way to find out.  :) 
 If you can provide an implementation that has the above characteristics,
 and
 which has a documented grammar, then I think it's reasonable to assume
 that
 people would be willing to take a good look at that implementation.

  I'm not sure who all the angry comments in
 parser.php belong to 
 svn praise includes/Parser.php | less

 -- All the best,
 Nick.

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers