[Wikipedia-l] robots and spiders

Axel Boldt axel at uni-paderborn.de
Sat May 18 18:42:14 UTC 2002


>if a spider goes to Recent Changes and then to "Last 5000 changes"
>(and last 90 days, and last 30 days, and last 2500 changes, and last
>1000 changes, and every such combination) it seems to me the server
>load could get pretty high. Perhaps talk pages should be spidered,
>but not recent changes or the history (diff/changes).

I agree. Every RecentChanges page contains links to 13 other
RecentChanges, and one of them changes its URL each time the page is
loaded. The other special: pages like statistics, all pages, most
wanted etc. seem to be good candidates for robot exclusion as well:
they stress the database but don't provide much useful information for
indices.

Regarding talk:, wikipedia: and user: pages, I don't see any reason not
to have them indexed.

Diff pages seem to be useless to spiders since the same information
is contained in the two article versions.

Remaining question is: what about article histories and old versions
of articles? Do we want Google to have a copy of every version of
every article, or only the current one?

Axel



More information about the Wikipedia-l mailing list