On 12/30/05, Brion Vibber <brion(a)pobox.com> wrote:
(snip)
To begin with, old versions are specifically marked
for spiders
not to index them. Deleted pages aren't accessible to an outside spider at all.
If your robots.txt is not set up properly to keep the robots from visiting the
pages, you should also set that up though that's just to keep useless load from
hitting the server.
So does this mean that I either /must/ or /don't need to/ tweak my own
robots.txt to ensure that robots don't crawl history?
I had heard that there are the proper meta tag (or whatever) to tell
spiders not to delve into revisions.. where can I learn more about
this issue?
I visited the meta page, but it doesn't go into detail..
http://meta.wikimedia.org/wiki/Robots.txt