On 25/06/05, Jake Waskett <jake(a)waskett.org> wrote:
Of those, only
20 have proxy or cache in the name.
Thoughts on how useful this sort of data would be, given the
reasonably sized sample above?
Ok, so of 126 addresses, we have about 20 proxies. So about 16% of anonymous
Wikipedias users are recognised as being behind a proxy, using this scheme. I
don't know the answer to this question, but does anybody know roughly what
proportion of web users go through a proxy server? Is it close to 16%? If so,
we've got a pretty good scheme here.
I've spoken to a friend working at one of the larger ISPs; the answer
is that it varies quite a bit. Only a minority of ISPs use them, but
those tend to be large ISPs (the canonical example is all the AOL
proxies you see around).
The upside is that most people are pragmatic, and call their proxy
servers things like "proxy-43765". So it looks like this is a fairly
effective way of identifying *most* proxies.
[He notes that there's also a "forwarded" header through most ISP
caches, which contains the "original" originating IP; I don't know if
this is accessible in this context or not, but it's useful to know it
exists]
[I also did another test on a larger sample - this brought it down to
~10% having "proxy" or "cache" in them. I may do further as resources
and tuits permit.]
Of course, a determined user could create a sub-domain
with 'proxy' or 'cache'
in the title, which would fool a simple software implementation, but perhaps
not a human.
In reply to geni's comment, we're talking about a minor change to the software
anyway, so all that's needed is to present the admin with this information at
the time that he or she chooses to block a user.
Ideally, the software could give the admin a "no IP block" option, to exercise
at his or her discretion (the software may already do this; I don't know).
I'm not an admin, so can't really comment how the process actually
works. Can I just check I have the mechanism right here? User:XYZ goes
and vandalises an article; an admin bans them; the system then
automatically slaps a short ban on the associated IP address, to
prevent them logging out and trying again?
It looks like in 80%+ of cases, telling people what the IP resolves to
won't make any difference; it'll just be extra noise (with some
occasional amusement, as when you notice a .gov domain). How does this
sound -
a) Admin goes to block a user. System does a check on IP address,
resolves it to
473a.residence.some.edu, doesn't flag it as a proxy,
keeps quiet, IP blocked.
b) Admin goes to block a user. System does a check on IP address,
resolves it to
usercache.admin.some.edu, and flags it because it
contains *cache*. Puts up a signal to the user - "The associated IP
address identifies as
USERCACHE.admin.some.edu, and blocking it may
affect multiple people. Do you wish to block it anyway?". Admin makes
the call.
This would leave us with the functionality we have now, but give an
option for a simple override when it's likely the IP address isn't
"personal". The fact that the display only comes up when it contains
one of the keywords means that the privacy implications are low - and
if you want it trimmed further, you can have it say that
"...identifies as USERCACHE.admin.*.*" or the like. It also limits the
amount of time wasted by admins, since it seems to be the case that
without one of the keywords, in most cases, a cache/proxy server won't
be apparent from the address alone.
Thoughts?
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk