On 11/11/07, Steve Bennett <stevagewp(a)gmail.com>
wrote:
> I'm hoping we might be able to sell it off the plan:
>
> "If we implement a parser that renders 99% of the current corpus of wikitext
> correctly, and we come up with
> a reasonable process for rolling it out without too much disruption,
> would you let us do it?"
If you want a list of hypothetical acceptance requirements, then I would add:
* Should render 99% of the articles in the English Wikipedia(*) identically to the
current parser.
* For the 1% that doesn't render the same, provide a list of what constructs
don't
render the same, and an explanation of whether support for that construct is
planned to be added, or whether you think it should not be supported because it's
a corner-case or badly-thought-out construct, or something else.
* Should have a total runtime for rendering the entire English Wikipedia equal
to or better than the total render time with the current parser.
* Should be implemented in the same language (i.e. PHP) so that any comparisons
are comparing-apples-with-applies, and so that it can run on the current installed
base of servers as-is. Having other implementations in other languages is fine
(e.g. you could have a super-fast version in C too) just provide one in PHP that can
be directly compared with the current parser for performance and
backwards-compatibility.
* Should have a worst-case render time no worse than 2x slower on any given input.
* Should use as much run-time memory as the current parser or less on average, and
no more than 2x more in the worst case.
* Any source code should be documented. The grammar used should be documented.
(since this is relates to the core driving reason for implementing a new parser).
* When running parserTests should introduce a net total of no more than (say) 2
regressions (e.g. if you break 5 parser tests, then you have to fix 3 or more
parser tests that are currently broken).
(*) = I'm using the English Wikipedia here as a test corpus as it's a large
enough
body of work, written by enough people, that it's statistically useful when comparing
average and worst-case performance and compatibility of wiki text as used by people
in the real world. Any other large body of human-generated wikitext of equivalent
size, with an equivalent number of authors, would do equally well for comparison
purposes.
I guess the
answer would be yes.
I'm guessing it would be "Sure, maybe, let's see the code first." One
way to find out. :)
If you can provide an implementation that has the above characteristics, and
which has a documented grammar, then I think it's reasonable to assume that
people would be willing to take a good look at that implementation.
I'm not sure who all the angry comments in
parser.php belong to
svn praise includes/Parser.php | less
-- All the best,
Nick.