Tim Starling wrote:
The method of history compression I'm leaning
towards is to concatenate
a number of adjacent revisions together (say 10) and then compress them
using an LZW-like algorithm. The compression algorithm will take
advantage of the similarities between adjacent revisions to produce a
very high rate of compression. Unlike diffs, this kind of compression
works very well with edit wars. In an edit war repeatedly reverting
between two revisions, a diff must store the changes from every edit,
whereas loosely speaking an LZW algorithm must only store each revision
once.
Hm. The problem with the "diff" you are describing is easily solved by `trying
diffs not only with the previous version, but, if that diff proves to be large, also
trying to make diffs with a few older versions. The smallest diff could then be stored in
combination with a pointer.
We should consider how often the compression is performed, and how often we need to
decompress. e.g. for a disk backup operation commonly compression algorithms are used that
are very light in compression, but heavy on decompression: one does not normally use the
backup anyway, only in disaster recovery.
For us, however, I think that the average difference is viewed more than once, so the
decompression is called a little but not a lot more often than the compression. Like in
RCS, it is much more likely that recent versions are requested than ancient versions, so a
cascading difference would work. I would hesitate to compress multiple versions into one
blob, because it would make the database tables a lot more complicated (how to refer to a
single old version?) and both compression and decompression quite expensive (every edit
will mean a decompress of some recent versions, appending of new data, and
recompression).
If the database structure has one entry per old version, it would even be possible to have
a daemon perform the compression of the old tables asynchronously. New entries in the old
table would be stored uncompressed, but, whenever the server is quiet, a daemon will hop
around in the tables replacing entries in the old table with differences or compressed
versions, whichever is more appropriate for the particular case.
Rob
--
Rob W.W. Hooft || rob(a)hooft.net ||
http://www.hooft.net/people/rob/