Nick Jenkins wrote:
Brion Vibber wrote:
One possibility is to embed the timestamp into the
URL. So the goatse
version might be:
http://upload.wikimedia.org/wikipedia/en/2005/10/23/074223/Puppy.jpg
and the reverted image would get a different URL, a few minutes later:
http://upload.wikimedia.org/wikipedia/en/2005/10/23/074506/Puppy.jpg
Alternative but very similar idea would be to embed the revision
number in the URL, instead of the upload timestamp:
Example original:
http://upload.wikimedia.org/wikipedia/en/P/1/Puppy.jpg
Example revised:
http://upload.wikimedia.org/wikipedia/en/P/2/Puppy.jpg
We had a lively discussion on in #wikimedia-tech on this subject; as
well as the revision ID numbers another possibility discussed was using
a content hash.
A content hash has the additional advantage that duplicate file versions
only need to be stored once; for instance currently when reverting a
file it makes a new copy of the file on the filesystem, which wastes
space. (However you then need to be careful about deleting.)
So you might have something like:
http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c496a7e…
Obviously a disadvantage is that the filenames are ugly. One might tack
a 'pretty' but ignored filename on the end, using rewrites or whatever
tool to drop it on the backend:
http://upload.wikimedia.org/584/590/5845907fdfc6eb1125129c4ce0da0704c496a7e…
This does though complicate the server configuration; I think a goal
should be making it very easy to set up a file mirror that we can
actually send requests to. Arbitrary filename additions may also have
security implications for broken browsers like Internet Explorer which
like to interpet filetype information out of the "extension" on the URL.
Lastly, it's easy for a human with the URL to see
what revisions come
before/after by incrementing/decrementing the digit in the URL,
whereas the date and time of the upload of a previous revision cannot
be predicted just from the image name.
That might be kind of neat, but requires maintaining a consistent
revision sequence _within_ each image. If using revision numbers, it's
easier to work with the global row id numbers as the database can
guarantee their uniqueness.
-- brion vibber (brion @
pobox.com)