Jakob Voss wrote:
Hi!
Can anybody tell my what differencing algorithm MediaWiki uses?
DifferenceEngine.php starts with "see diff.doc" - a file I cannot find.
As far as I know it has been taken from phpwiki but is it still the same
algorithm and what algorithm does phpwiki exactely use? I have not found
any better references on differencing algorithms but the papers listed
at
http://en.wikipedia.org/wiki/Diff#References. There seems to be some
new research in differencing XML-files. Is MedaWiki's differencing
algorithm based on E. Myers's algorithm (1986)?
MediaWiki's diff code was mostly written by Geoffrey Dairiki, for
PhpWiki. It was imported into MediaWiki by Magnus and has since been
modified by a few MediaWiki developers. It's a lowest-common subsequence
(LCS) algorithm, and is "mostly lifted" from a Perl module by Ned Konz,
now called Algorithm::DiffOld. It's called DiffOld because the new
Algorithm::Diff is faster. Maybe if someone's feeling ambitious, they
could port the new perl module. I was wondering if those string
comparisons were the slow part, and it seems the author of the new
Algorithm::Diff package found they were.
There is indeed a C++ version, but we don't use it. Brion had a bit of
trouble installing it, and based on the performance statistics, it's not
urgent. We'll get it working eventually.
-- Tim Starling