So, I've been thinking about the compressed storage of old versions of
MediaWiki articles, and I've been thinking about some other
ideas. I'll probably scribble them on meta soon; I wanted to float
them here first.
Some (most?) version control systems use a reverse diff storage
system to keep storage requirements down to a minimum, at the expense
of retrieval time for old versions.
The idea is pretty simple: you always keep a full copy of the current
version of a file -- let's call it version N. But instead of keeping a
full copy of version N - 1, you just keep the difference between that
copy and version N. Similarly, for version N - 2, you keep the
difference between N - 2 and N - 1.
If differences between versions are significantly smaller than the
versions themselves, there's a great reduction in storage requirements
for the entire system.
To generate a full copy of version N - 2, you have to first retrieve
version N, then the diff between version N - 1 and version N, and
apply that patch to make version N - 1, then retrieve the diff between
version N - 2 and version N - 1, and apply that patch to make version
N - 2. The idea, though, is that you retrieve old versions much, much
less frequently than the current version.
You could conceivably store all old versions as diffs against the
current version -- making retrieval time faster -- but it makes
storing slower, since you have to recompute the diffs each time you
save.
Another optimization is keeping an LRU cache of old versions that have
been retrieved.
Anyways, I think this would probably make storage much smaller for
MediaWiki databases. Even if compression reduces the size of old
versions to, say, 10% of their original size, storing a diff that's
just 1% of the old version is even more of an improvement. If storage
is at a premium, you could even compress the diffs... although this
probably wouldn't pay off for very small diffs.
~ESP
--
Evan Prodromou <evan(a)wikitravel.org>
Wikitravel -
http://www.wikitravel.org/
The free, complete, up-to-date and reliable world-wide travel guide