Re: [WikiEN-l] Transparent proxy blocked - again

27 Jun 2005

On Monday 27 June 2005 11:14, Andrew Gray wrote:
...
  On 25/06/05, Jake Waskett &lt;jake(a)waskett.org&gt;
wrote:
   Of those,
only 20 have proxy or cache in the name.

 Thoughts on how useful this sort of data would be, given the
 reasonably sized sample above? 
 Ok, so of 126 addresses, we have about 20 proxies. So about 16% of
 anonymous Wikipedias users are recognised as being behind a proxy, using
 this scheme. I don't know the answer to this question, but does anybody
 know roughly what proportion of web users go through a proxy server? Is
 it close to 16%? If so, we've got a pretty good scheme here. 
 I've spoken to a friend working at one of the larger ISPs; the answer
 is that it varies quite a bit. Only a minority of ISPs use them, but
 those tend to be large ISPs (the canonical example is all the AOL
 proxies you see around).

 The upside is that most people are pragmatic, and call their proxy
 servers things like "proxy-43765". So it looks like this is a fairly
 effective way of identifying *most* proxies.

 [He notes that there's also a "forwarded" header through most ISP
 caches, which contains the "original" originating IP; I don't know if
 this is accessible in this context or not, but it's useful to know it
 exists]

 [I also did another test on a larger sample - this brought it down to
 ~10% having "proxy" or "cache" in them. I may do further as
resources
 and tuits permit.] 
Seems a shame that Wikipedia (rightly) doesn't allow original research. This 
is very interesting reading. :-)

I seem to remember that Wikipedia had it's millionth edit (or something like 
that) not long ago. 10-20% might not seem much, but it helps put it in 
perspective.

...

  Of course, a determined user could create a
sub-domain with 'proxy' or
 'cache' in the title, which would fool a simple software implementation,
 but perhaps not a human.

 In reply to geni's comment, we're talking about a minor change to the
 software anyway, so all that's needed is to present the admin with this
 information at the time that he or she chooses to block a user.

 Ideally, the software could give the admin a "no IP block" option, to
 exercise at his or her discretion (the software may already do this; I
 don't know). 
 I'm not an admin, so can't really comment how the process actually
 works. Can I just check I have the mechanism right here? User:XYZ goes
 and vandalises an article; an admin bans them; the system then
 automatically slaps a short ban on the associated IP address, to
 prevent them logging out and trying again? 
I'm not an admin either, so at the risk of this becoming the "uninformed users 
speculate about admins thread", let me offer my 2c.

My *understanding* is that the IP blocks (aka autoblocks) are added by the 
system at a later time, for exactly the reason you suggest. However, their 
implementation is very odd indeed. Instead of expiring when the original 
block did, they add the duration of the block to the time that the IP 
concerned last accessed Wikipedia (as opposed to the last attempt to edit). 
As I once discovered when legitimately blocked for a 3RR violation, this has 
the consequence that merely refreshing the list of currently blocked users to 
check whether the block has expired will keep you blocked indefinitely.

This shouldn't be a a problem, however. The system must be storing the last IP 
used by a user, since this autoblock-on-access mechanism cannot operate 
without that data, so it can easily be checked at the time of an admin 
setting a block.

...

 It looks like in 80%+ of cases, telling people what the IP resolves to
 won't make any difference; it'll just be extra noise (with some
 occasional amusement, as when you notice a .gov domain). How does this
 sound - 
Logical.

...

 a) Admin goes to block a user. System does a check on IP address,
 resolves it to 473a.residence.some.edu, doesn't flag it as a proxy,
 keeps quiet, IP blocked.

 b) Admin goes to block a user. System does a check on IP address,
 resolves it to usercache.admin.some.edu, and flags it because it
 contains *cache*. Puts up a signal to the user - "The associated IP
 address identifies as USERCACHE.admin.some.edu, and blocking it may
 affect multiple people. Do you wish to block it anyway?". Admin makes
 the call. 
Again, logical. We'd need to have a list of words to scan for, but this is 
easy enough and the load on the server minimal.

In this case, I think it would be useful for an admin to have the facility to 
set a user block but prevent autoblocks from being applied. This just means 
setting a flag in the block table. As I explained before, there are other 
ways of achieving proxy-friendly autoblock-equivalents, but that might be too 
complicated.

...

 This would leave us with the functionality we have now, but give an
 option for a simple override when it's likely the IP address isn't
 "personal". The fact that the display only comes up when it contains
 one of the keywords means that the privacy implications are low - and
 if you want it trimmed further, you can have it say that
 "...identifies as USERCACHE.admin.*.*" or the like. It also limits the
 amount of time wasted by admins, since it seems to be the case that
 without one of the keywords, in most cases, a cache/proxy server won't
 be apparent from the address alone.

 Thoughts? 
Seems entirely logical to me. It would be nice to hear from somebody who *is* 
an admin, and can comment on that basis. How would such a facility affect you 
people?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Transparent proxy blocked - again