Timwi-
Do we really have to do that? I always thought storage
space is
incredibly cheap (at least that's what LiveJournal always say ;-) ). I'm
finding it somewhat irritating that you want to slow down Wikipedia even
more, just to save some disk space.
If we store full versions within a reasonable interval the performance
impact should not be noticeable, perhaps even smaller than the gzip
decompression. We've operated under the "diskspace is infinite" assumption
for a long time, but aside from the fact that it isn't true, an
inefficient storage system makes things a lot more cumbersome for
downstream users who have to download gigabytes of tarballs to get the
revision histories.
A simple diff algorithm could still lead to plenty of wasted space for
edit wars. An advanced system with pointers to prior versions would be
even more efficient. One way to implement this would be to verify on
saving if the user has edited an old revision and if so, if they have
changed its content -- if they haven't, a pointer would be inserted
instead of a diff. For existing revisions a script could weed out
duplicates.
Of course none of this is likely to happen anytime soon, as it's quite
tricky to implement and gives no immediately visible (bragworthy) benefits
(and lots of flames if it doesn't work exactly right). This is the kind of
code that gets written on long winter afternoons to see if it can be done.
Winter is almost over, so ...
Regards,
Erik