Using publicly-available data you can find out the set
of pages edited
by each username. Then it is possible, with some degree of uncertainty,
to link some usernames to one or more "unique signature"s (from your
quoted text above), by matching sets of user page requests to sets of
pages edited. Thus some of the data we would release to you is
bordering on, if not definitely, personally identifiable data which is
not already publicly available.
I would like to work around this issue such that no editor's identity is
compromised. If the IP addresses of named wikipedia editors are discernable by
matching their wikipedia dated edit saves to the corresponding apache log
entries, when such log entries are found by the script, they can be skipped
entirely, not SHA1 hashed. Thereby no data that is personally
identifiable will
exist.
The privacy policy [1] says that
"personally identifiable data collected in the server logs will not be
released by the developers who have access to it," except under certain
circumstances, none of which cover this case.
[1]
http://wikimediafoundation.org/wiki/Privacy_policy
Tony Pryor
---
en:user:jeronim
---
Send instant messages to your online friends
http://au.messenger.yahoo.com
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l