[Wikitech-l] Re: bandwidth thieves blocked

13 Mar 2004

Thanks much.

----- Original Message ----- 
From: "Gabriel Wicke" &lt;groups-0dArdoQz2ssGSlwUNQtawg(a)public.gmane.org&gt;
Newsgroups: gmane.science.linguistics.wikipedia.technical
Sent: Friday, March 12, 2004 5:31 PM
Subject: Re: bandwidth thieves blocked

...
  On Fri, 12 Mar 2004 19:31:05 +0000, David Rodeback
wrote:

 >> Download and install the texts.  Spider your installation and extract
 >> images references.  Convert the filenames to those matching the pictures
...
  >> at the WP site.  Download the files of this
list using 'wget'.
 >>
 >> Or something like that could work.
 >>
 >
 > Since our current process includes all these steps except the last, at which
...
  > point we link to the file, not get it, this is
easily done.
 >
 > Am I to gather that a reasonably well-behaved spider is preferred to linking
...
  > back to Wikipedia's site as we have been
doing?
 >
 > Can someone define for me what would be the off-peak hours in which such a
...
   spider should
run? 
 See
 http://wikimedia.org/stats/live/org.wikimedia.all.squid.requests-hits.html

 > Finally, is there a place at Wikipedia (I know of several elsewhere) for
 > registering such spiders with descriptions and contact information, in case
...
  > someone observes the spider working and wonders,
or in case there is some sort
...
   of problem?

 Set the user agent to something descriptive, like 'worldhistory'. Be sure
 not to include typical spider UA strings. And throttle the requests, wget
 offers a rate setting for that.
 -- 
 Gabriel Wicke 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: bandwidth thieves blocked