On 11/12/06, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
On 11/12/06, Anthony <wikilegal(a)inbox.org>
wrote:
At the worst backups should require twice as much
space as the images
themselves. If Wikimedia's backups require much more Brion is doing
something really really wrong.
Note that I didn't even say hard drive space was cheap. I just said
it's cheaper than education.
*sigh*.
If only things were ever that simple.
Simply having 2x disks in the same chassis isn't a backup, and the
total costs for all complex things are non-linear.
Rotating tapes offsite is a backup. Transferring everything to two
other data centers is a backup (put those data centers on different
continents and it's a local cache too). Backup isn't a complex thing.
And unless you're doing something dumb the cost of backup is most
certainly linear (compared to the cost of the initial storage).
Put the image dumps in gigabyte chunks on a superseeding bittorrent
server, and you could probably get backups nearly free (just the cost
of transferring the images once if you can convince enough others to
act as seeds, which you probably could). Of course, now I'm talking
introducing a bit of design into things. Really stupid easy backups
like the ones in my first message are still linear.
More importantly: categorization, verification,
search, etc are not
cheap. Nor is the time of the users we serve. We'd do a great
disservice by allowing commons to become a disordered dumping ground.
You contradict yourself. Being a disordered dumping ground doesn't
require categorization, verification, or search.
No I don't.
I suspect you've been confused by my befuddled English.
The avoidance of being a disordered dumping ground requires
non-trivial *per image* work for categorization, verification, etc.
"Upload all your trash" doesn't scale and will ensure that we are
never able to become well ordered... which is an outcome which would
diminish our value to the public.
I'd say that "explain[ing to] people that not every ''shitty
image''
they produce
is worth publishing on Wikimedia projects" doesn't scale either, and
that an image repository with some parts which are organized and some
parts which aren't has an equal or even higher value than an image
repository without those disorganized parts.
Of course, space *is* a consideration, and it wouldn't make sense to
outright advertise "dump all your trash here". By all means rules
should be in place that say that useless crap will be deleted. And
sure, if space gets tight that rule might need to be enforced to a
greater extent than when it isn't. But trying to explain to people
what should be obvious, that I'd say is a waste of resources.
Anthony