Michael Bimmler wrote:
To put it bluntly, I dare suggest from a non-technical
POV that the "htdig"
(that's the name, isn't it?) experiment has failed. If we can only update
our search index every 6 months or so, it is pointless to have it.
Yeah, it doesn't work as well as advertised.
Instead, I suggest that
http://lists.wikimedia.org/robots.txt be modified as
to allow Google (and other search engines) to crawl /pipermail/ again. I do
not really see the privacy issues of this, nabble, gmane etc. are
google-searchable as well and I really don't see the point in barring Google
from our own archive.
For the meantime, I'm going to have to recommend not doing this (see my
notes below for why).
As you note, it's already possible to search via third-party archives.
It would probably not be difficult to replace the broken htdig search
form with a link to a nice offsite archive, though.
If I am very honest, I do not even remember anymore,
why we decided to bar
Google from
http://lists.wikimedia.org/pipermail.
Because:
a) The current mailman/pipermail system makes it a *huge* pain in the
butt to remove mails from archives on request
b) I got tired of the volume of requests to remove mails from archives,
with the consequent time required in handling them
c) With the wildly popular
wikimedia.org domain out of the running,
third-party list archives aren't as visible in general search engine results
d) Therefore, the volume of requests go down
e) and I don't feel bad turning down most of the remaining requests.
If and when mailman's archiving system is fixed up to make it quick &
easy to take a mail out of archives (eg, *not* involving shutting down
all mail processing, rebuilding an entire list's archives since 2001,
and discovering that all the links are now broken because mailman's
internal behavior has changed in the intervening years and it splits up
messages differently), then I'll be happy to pop us back into general
search engine indexes.
Was it due to privacy concerns? If so, which, and why
is
lists.wikimedia.orgas an archive different from Nabble/Gmane?
That'd be c) above.
-- brion vibber (brion @
wikimedia.org)