Je Vendredo 23 Majo 2003 05:09, John R. Owens skribis:
On Fri, 23 May 2003, Jimmy Wales wrote:
> A friend forwarded me this humor page, and I tried to use
> it with
http://www.wikipedia.org, with the very surprising
> result that it returned the "Index of /" from wikipedia, rather
> than the Snoop-Dogg lingo translation of the page.
>
> Probably this is evidence of something we've done wrong?
It's a bug on their part, but we should be treating it slightly
differently, ie *not* by returning a directory index. :)
If I put in "http://www.wikipedia.org" it makes these requests:
GET /robots.txt HTTP/1.0
GET // HTTP/1.0 "
(note the double slash)
"http://www.wikipedia.org/":
GET /robots.txt HTTP/1.0
GET /// HTTP/1.0
(triple slash!!)
http://www.wikipedia.org/wiki/"quot;:
GET /robots.txt HTTP/1.0
GET /wiki/// HTTP/1.0
GET /robots.txt HTTP/1.0
GET /style// HTTP/1.0
(seems it just puts double slashes on the end of everything. not sure
why it's asking for /style as a directory...)
"http://www.wikipedia.org/wiki/Main_Page":
GET /robots.txt HTTP/1.0
GET /wiki/Main_Page//
GET /robots.txt HTTP/1.0
GET /style// HTTP/1.0
I've noticed that kind of behaviour a few weeks
ago, when I was
trying to wget something, so it's not just something overlooked in
setting up the new server, most likely. At the time, I assumed it was
either because I didn't bother to do the cookie setup with wget, or
it was set to reject some User-Agent:s to keep the bots and such out.
It should give you a 403 rejected response for wget. (If you really need
to use wget to fetch _single_ files, use the --user-agent option. This
is to discourage recursive fetches of the entire site.)
-- brion vibber (brion @
pobox.com)