[Foundation-l] Google Analytics test

Andrew Whitworth wknight8111 at hotmail.com
Tue Oct 2 16:27:23 UTC 2007


Many people are probably aware now that i've started a test of the Google Analytics page counter on en.wikibooks. I hear that people are running a similar kind of test on en.wikinews. Currently, these programs are opt-in: only registered users are using these scripts, and it involves manually adding them to the personal monobook files. The information received so far has been fantastic: counts of page hits, click patterns, information about entry points that we can use to improve the welcome for new visitions, etc. However this test has also raised a few concerns. Some concerns I would like to address, others I would like to get input from the foundation about.

1) First and foremost is the issue of privacy. The information that google analytics collects is a step above what is typically available to regular users, but not quite as detailed as CU data. Some information, such as geographical area and the ISP of a user is aggregated, but it is not attached in any way to a user's screenname. That is, without a priori knowledge about the user, it is impossible to attach a particular username to a particular ISP, geographical location, or any other piece of collected data. I am currently inspecting the google analytics code looking for a way to suppress the collection of ISP or geographical information, but havent found a way yet.

1a) Ancillary to the idea of privacy is the issue that the analytics code should probably remain opt-in. Many users are conscious of privacy and security issues, and they shouldnt be forced to decide between participating in a tracking program or not visiting wikibooks at all. I've proposed a solution that unregistered users could be tracked by default (testing wgUserName == null), but registered users would need to opt-in explicitly. After all, I feel that information about our readers is far more important then the same information about our editors.

1b) Another related idea is that individual books could be tracked for readership patterns, while the whole remainder of the wikibooks project could remain script-free. Notification templates could be used to indicate which books the scripts were active on. A book could be tracked for a month or so at a time. We could track a handful of books at once, and then change which books we track on a regular basis.

2) Second is the issue of server load. Running the script now currently involves an additional javascript page access per user. However, the javascript files can be cached. The script runs in javascript and performs interactions with the google analytics website, but does not transact with the WMF servers. I believe that server load for us should be minimal (but I want confirmation about this from the techs)

3) Log files are only available by default to the google account holder (myself) and other people that are specifically added by myself to the profile. If we keep the access list very restrictive, we dont need to worry about sensitive data from becoming public. However, we do run the risk of giving users with access "power", which is a common fear. If we were to set up accounts on behalf of the project or the WMF (as opposed to personal accounts), we could negate this issue entirely.

I'm looking for as much input on this issue as I can get. I'm not planning to make any changes to any javascript for the forseeable future, till the concerns are ironed out. 

--Andrew Whitworth


_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us


More information about the foundation-l mailing list