Re: [Wikitech-l] request; access to (anonymized) apache log data for a recommendation system

27 Jul 2005

tpryor(a)media.mit.edu wrote:
...

  Using publicly-available data you can find out
the set of pages edited
 by each username.  Then it is possible, with some degree of uncertainty,
 to link some usernames to one or more "unique signature"s (from your
 quoted text above), by matching sets of user page requests to sets of
 pages edited.  Thus some of the data we would release to you is
 bordering on, if not definitely, personally identifiable data which is
 not already publicly available.  

 I would like to work around this issue such that no editor's identity is
 compromised. If the IP addresses of named wikipedia editors are 
 discernable by
 matching their wikipedia dated edit saves to the corresponding apache log
 entries, when such log entries are found by the script, they can be skipped
 entirely, not SHA1 hashed. Thereby no data that is personally 
 identifiable will
 exist. 
Even records of just page views (no edits) with no time stamps and a 
hashed IP will give you information about personal interests which can 
be fuzzily matched to named editors, by virtue of the fact that the set 
of pages viewed is likely a superset of the set of pages edited.  People 
may have interests in certain topics, read Wikipedia pages related to 
those topics, but avoid editing these pages in order to keep it private. 
  This personally-identifiable information, or at least an approximation 
of it, would be leaked, and privacy policy would be violated, no?

...
   The privacy
policy [1] says that
 "personally identifiable data collected in the server logs will not be
 released by the developers who have access to it," except under certain
 circumstances, none of which cover this case.  

> [1] http://wikimediafoundation.org/wiki/Privacy_policy
>
>> Tony Pryor
>>
> ---
> en:user:jeronim
> --- 
Send instant messages to your online friends http://au.messenger.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] request; access to (anonymized) apache log data for a recommendation system