[Wikitech-l] Wikidiff2

20 Feb 2006

An enhanced version of the C++ diff extension, wikidiff2, is now running on both clusters.
It now
does character-level diffs on Chinese, Japanese and Thai, so it produces much better
results than
the PHP diff algorithm, in a much shorter time to boot. Chinese had an ad-hoc segmentation
scheme
based on inserting a space between every character before the diff, then removing the
spaces
afterwards, but unfortunately that left spaces all over the place where there
shouldn't have been
spaces. Anyway, it's fixed now.

We're still calling dl() every time a diff is needed, and I'm still waiting for
profiling results on
the effect of that. The performance of the algorithm is quite good though, on our
opterons, it can
diff 2MB (each side) of the most pathological input text I've yet been able to devise
in 5.2
seconds, and it does it with only about 15MB of memory.

-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Wikidiff2