At 08:42 PM 5/18/02 +0200, Axel Boldt wrote:
if a spider
goes to Recent Changes and then to "Last 5000 changes"
(and last 90 days, and last 30 days, and last 2500 changes, and last
1000 changes, and every such combination) it seems to me the server
load could get pretty high. Perhaps talk pages should be spidered,
but not recent changes or the history (diff/changes).
I agree. Every RecentChanges page contains links to 13 other
RecentChanges, and one of them changes its URL each time the page is
loaded. The other special: pages like statistics, all pages, most
wanted etc. seem to be good candidates for robot exclusion as well:
they stress the database but don't provide much useful information for
indices.
Actually, wouldn't "All pages" be a very _good_ page to allow spiders to
read? It would let them cut straight to the heart of the matter and get a
list of all the pages they need from Wikipedia. At the very least, they
should be allowed to read the orphans list, since the pages listed there
won't be found by spidering through the conventional pages.
--
"Let there be light." - Last words of Bomb #20, "Dark Star"