Thanks much.
----- Original Message -----
From: "Gabriel Wicke" <groups-0dArdoQz2ssGSlwUNQtawg(a)public.gmane.org>
Newsgroups: gmane.science.linguistics.wikipedia.technical
Sent: Friday, March 12, 2004 5:31 PM
Subject: Re: bandwidth thieves blocked
On Fri, 12 Mar 2004 19:31:05 +0000, David Rodeback
wrote:
>> Download and install the texts. Spider your installation and extract
>> images references. Convert the filenames to those matching the
pictures
>> at the WP site. Download the files of this
list using 'wget'.
>>
>> Or something like that could work.
>>
>
> Since our current process includes all these steps except the last, at
which
> point we link to the file, not get it, this is
easily done.
>
> Am I to gather that a reasonably well-behaved spider is preferred to
linking
> back to Wikipedia's site as we have been
doing?
>
> Can someone define for me what would be the off-peak hours in which such
a
spider should
run?
See
http://wikimedia.org/stats/live/org.wikimedia.all.squid.requests-hits.html
> Finally, is there a place at Wikipedia (I know of several elsewhere) for
> registering such spiders with descriptions and contact information, in
case
> someone observes the spider working and wonders,
or in case there is
some sort
of problem?
Set the user agent to something descriptive, like 'worldhistory'. Be sure
not to include typical spider UA strings. And throttle the requests, wget
offers a rate setting for that.
--
Gabriel Wicke