On Monday 27 June 2005 11:14, Andrew Gray wrote:
On 25/06/05, Jake Waskett <jake(a)waskett.org>
wrote:
Of those,
only 20 have proxy or cache in the name.
Thoughts on how useful this sort of data would be, given the
reasonably sized sample above?
Ok, so of 126 addresses, we have about 20 proxies. So about 16% of
anonymous Wikipedias users are recognised as being behind a proxy, using
this scheme. I don't know the answer to this question, but does anybody
know roughly what proportion of web users go through a proxy server? Is
it close to 16%? If so, we've got a pretty good scheme here.
I've spoken to a friend working at one of the larger ISPs; the answer
is that it varies quite a bit. Only a minority of ISPs use them, but
those tend to be large ISPs (the canonical example is all the AOL
proxies you see around).
The upside is that most people are pragmatic, and call their proxy
servers things like "proxy-43765". So it looks like this is a fairly
effective way of identifying *most* proxies.
[He notes that there's also a "forwarded" header through most ISP
caches, which contains the "original" originating IP; I don't know if
this is accessible in this context or not, but it's useful to know it
exists]
[I also did another test on a larger sample - this brought it down to
~10% having "proxy" or "cache" in them. I may do further as
resources
and tuits permit.]
Seems a shame that Wikipedia (rightly) doesn't allow original research. This
is very interesting reading. :-)
I seem to remember that Wikipedia had it's millionth edit (or something like
that) not long ago. 10-20% might not seem much, but it helps put it in
perspective.
Of course, a determined user could create a
sub-domain with 'proxy' or
'cache' in the title, which would fool a simple software implementation,
but perhaps not a human.
In reply to geni's comment, we're talking about a minor change to the
software anyway, so all that's needed is to present the admin with this
information at the time that he or she chooses to block a user.
Ideally, the software could give the admin a "no IP block" option, to
exercise at his or her discretion (the software may already do this; I
don't know).
I'm not an admin, so can't really comment how the process actually
works. Can I just check I have the mechanism right here? User:XYZ goes
and vandalises an article; an admin bans them; the system then
automatically slaps a short ban on the associated IP address, to
prevent them logging out and trying again?
I'm not an admin either, so at the risk of this becoming the "uninformed users
speculate about admins thread", let me offer my 2c.
My *understanding* is that the IP blocks (aka autoblocks) are added by the
system at a later time, for exactly the reason you suggest. However, their
implementation is very odd indeed. Instead of expiring when the original
block did, they add the duration of the block to the time that the IP
concerned last accessed Wikipedia (as opposed to the last attempt to edit).
As I once discovered when legitimately blocked for a 3RR violation, this has
the consequence that merely refreshing the list of currently blocked users to
check whether the block has expired will keep you blocked indefinitely.
This shouldn't be a a problem, however. The system must be storing the last IP
used by a user, since this autoblock-on-access mechanism cannot operate
without that data, so it can easily be checked at the time of an admin
setting a block.
It looks like in 80%+ of cases, telling people what the IP resolves to
won't make any difference; it'll just be extra noise (with some
occasional amusement, as when you notice a .gov domain). How does this
sound -
Logical.
a) Admin goes to block a user. System does a check on IP address,
resolves it to
473a.residence.some.edu, doesn't flag it as a proxy,
keeps quiet, IP blocked.
b) Admin goes to block a user. System does a check on IP address,
resolves it to
usercache.admin.some.edu, and flags it because it
contains *cache*. Puts up a signal to the user - "The associated IP
address identifies as
USERCACHE.admin.some.edu, and blocking it may
affect multiple people. Do you wish to block it anyway?". Admin makes
the call.
Again, logical. We'd need to have a list of words to scan for, but this is
easy enough and the load on the server minimal.
In this case, I think it would be useful for an admin to have the facility to
set a user block but prevent autoblocks from being applied. This just means
setting a flag in the block table. As I explained before, there are other
ways of achieving proxy-friendly autoblock-equivalents, but that might be too
complicated.
This would leave us with the functionality we have now, but give an
option for a simple override when it's likely the IP address isn't
"personal". The fact that the display only comes up when it contains
one of the keywords means that the privacy implications are low - and
if you want it trimmed further, you can have it say that
"...identifies as USERCACHE.admin.*.*" or the like. It also limits the
amount of time wasted by admins, since it seems to be the case that
without one of the keywords, in most cases, a cache/proxy server won't
be apparent from the address alone.
Thoughts?
Seems entirely logical to me. It would be nice to hear from somebody who *is*
an admin, and can comment on that basis. How would such a facility affect you
people?