Re: [Wikitech-l] [tfoote@bomis.com: A MUST see]

23 May 2003

Je Vendredo 23 Majo 2003 05:09, John R. Owens skribis:
...
  On Fri, 23 May 2003, Jimmy Wales wrote:
 > A friend forwarded me this humor page, and I tried to use
 > it with http://www.wikipedia.org, with the very surprising
 > result that it returned the "Index of /" from wikipedia, rather
 > than the Snoop-Dogg lingo translation of the page.
 >
 > Probably this is evidence of something we've done wrong? 
It's a bug on their part, but we should be treating it slightly 
differently, ie *not* by returning a directory index. :)

If I put in "http://www.wikipedia.org" it makes these requests:
GET /robots.txt HTTP/1.0
GET // HTTP/1.0 "
(note the double slash)

"http://www.wikipedia.org/":
GET /robots.txt HTTP/1.0
GET /// HTTP/1.0
(triple slash!!)

http://www.wikipedia.org/wiki/"quot;:
GET /robots.txt HTTP/1.0
GET /wiki/// HTTP/1.0
GET /robots.txt HTTP/1.0
GET /style// HTTP/1.0
(seems it just puts double slashes on the end of everything. not sure 
why it's asking for /style as a directory...)

"http://www.wikipedia.org/wiki/Main_Page":
GET /robots.txt HTTP/1.0
GET /wiki/Main_Page//
GET /robots.txt HTTP/1.0
GET /style// HTTP/1.0

...
  I've noticed that kind of behaviour a few weeks
ago, when I was
 trying to wget something, so it's not just something overlooked in
 setting up the new server, most likely. At the time, I assumed it was
 either because I didn't bother to do the cookie setup with wget, or
 it was set to reject some User-Agent:s to keep the bots and such out. 
It should give you a 403 rejected response for wget. (If you really need 
to use wget to fetch _single_ files, use the --user-agent option. This 
is to discourage recursive fetches of the entire site.)

-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [tfoote@bomis.com: A MUST see]