--- On Wed, 8/12/09, Roan Kattouw <roan.kattouw(a)gmail.com> wrote:
I read this paragraph first, then read the paragraph
above
and
couldn't help saying "WHAT?!?". Using a huge set of pages
is a poor
replacement for decent tests.
I am not proposing that the CPRT be a substitute for "decent tests." We still
need a a good set of tests for the whole MW product (not just the parser). Nor would I
recommend making a change to the parser and then immediately running the CPRT. Any
developer that isn't masochistic would first run the existing parserTests and ensure
it passes. Then, you probably want to run the modified DumpHTML against a small random
selection of pages in the WP DB. Only if it passes those tests would you then run the CPRT
for final assurance.
The CPRT I am proposing is about as good a test of the parser that I can think of. If a
change to the parser passes it using the Wikipedia database (currently 5 GB), then I would
say for all practical purposes the changes made to the parser do not regress it.
Also, how would you handle
intentional
changes to the parser output, especially when they're
non-trivial?
I don't understand this point. Would you elaborate?
Dan