[Foundation-l] Wikipedia Web Logs for scientific research

Mirco Nanni mirco.nanni at isti.cnr.it
Tue May 10 15:56:58 UTC 2005


Dori wrote:
>>[Addendum: the sensible information in web logs is
>>essentially located in the "client IP" field ("who visited
>>that page"). However, for our research purposes such field
>>is not strictly needed as an encrypted version of it would
>>be enough, thus avoiding most of the privacy issues.]
> 
> The problem is if you substitute the IP with a unique number, and you still 
> show accesses to user pages, you can probably identify the logged in users. 
> I'd be OK if the IPs were masked AND accesses to non-article namespace pages 
> were not given out.

   Well, our objective is not to make web accesses public, 
but to apply analysis techniques on them and possibly make 
some selected results public (something like -- but a bit 
more sophisticated and specific than -- the Webalizer system 
which is now used to build the Wikipedia usage statistics).
   However, you are right, masking IPs does not solve 
privacy problems once and for all. I agree with restricting 
to web traffic relative to articles, discarding personal 
pages and similar -- moreover, they are not very interesting 
for our research purposes.

                      - Mirco

  ====================================
   http://ercolino.isti.cnr.it/mirco
  ====================================




More information about the foundation-l mailing list