On 1/23/09 2:36 AM, Andre Engels wrote:
Two questions:
1. Why is this User Agent getting this response? If I remember
correctly, this was installed in the early days of the pywikipediabot,
when Brion wanted to block it because it had a programming error
causing it to fetch each page twice (sometimes even more?). If that is
the actual reason, I see no reason why it should still be active years
afterward...
This has nothing to do with pywikipediabot.
We too frequently encountered poorly-written bots and site-scrapers
which slammed the servers too hard and caused problems. Blocking default
UAs of common libraries cut these incidents down dramatically, and helps
encourage thoughtful bot writers to put specific information into their
user-agent string, making it possible to track them down more easily if
they are problematic.
2. If this User Agent is really to be blocked, why do
we still provide
the content of the page that is forbidden?
We don't; you get a big fat Wikimedia-customized error page with a
generic multilingual message, and this bit somewhere in the middle:
<!-- Technical details of the error; shows all the time, with any
language -->
<div class="TechnicalStuff">
<bdo dir="ltr">
Request: GET
http://en.wikipedia.org/wiki/Foo, from 69.17.48.227
via
sq24.wikimedia.org (squid/2.6.STABLE21) to ()<br/>
Error: ERR_ACCESS_DENIED, errno [No Error] at Fri, 23 Jan 2009
17:59:46 GMT
</bdo>
<div id="AdditionalTechnicalStuff"></div>
</div>
-- brion