On Thu, Oct 13, 2011 at 9:34 PM, Carcharoth <carcharothwp(a)googlemail.com> wrote:
And the mention of data preservation made me wonder as
well. I would
hate to see things like Google Groups eventually vanish or become less
useful. You mentioned Flickr briefly as well, saying it is (still)
neglected by Yahoo. And you said the Geocities shutdown was
"shockingly abrupt".
Yes, it was pretty abrupt. See Jason Scott on this issue and how it
wasn't even announced but buried in some obscure Yahoo documentation
entry.
One of the most depressing thing about the internet is
dead links
(despite some archival services).
And then I got to thinking how long
www.gwern.net will survive for? :-)
Depends; I hope to maintain it indefinitely and it's pretty cheap. I
am interested in archiving and have my own strategies implemented (see
http://www.gwern.net/Archiving%20URLs ), since I gave up trying to
work with WMF & IA on archiving* and looked into what I could do as an
individual.
* I was very pleased to see that one of this year's Summer of Code
projects was an archive plugin, but I will believe it when I see it
running on the English Wikipedia.
They already work to some extent; for example, you can see most of
gwern.net is already in the Internet Archive (
http://wayback.archive.org/web/*/http://www.gwern.net/* ) and many
pages are in WebCite (eg.
http://webcitation.org/627QAJ3KL ). I
provide the source repository, so if anyone ever cloned it, it is
available that way too. (HTML dumps of
gwern.net periodically go into
my personal backup harddrive.)
[Moving from the corporate level of the web to the
personal level]
Do you think the collections of blogs hosted by the WMF will survive
longer than various other blogs and blogging sites?
Or even the archives of the WMF mailing lists.
Yes. WMF is corporatizing, and that inherently means stuff will
survive longer than personal fly-by-night blogs. If nothing else,
those mailing list and blogs are relatively small - Google Groups and
Usenet is *huge*. If the Archive Team had to rescue those, I doubt
they would be able to retrieve any but a small fraction even with a
long lead time.
--
gwern
http://www.gwern.net