[Foundation-l] Release of squid log data

Gregory Maxwell gmaxwell at gmail.com
Sat Sep 15 17:45:18 UTC 2007


On 9/16/07, Gwern Branwen <gwern0 at gmail.com> wrote:
> On 2007.09.15 01:38:00 -0400, Gregory Maxwell <gmaxwell at gmail.com> scribbled 11 lines:
> > On 9/15/07, Gwern Branwen <gwern0 at gmail.com> wrote:
> > > In a very strong sense, we can 'safely' make no data available.
> >
> > This is a counter-productive over-statement. It is only true in the
> > same sort of useless sense that many dramatic maxims are true in...
>
> Dramatic maxims are useful for shock value, which is what is needed here

We probably have an unresolvable difference in value.

In my view decision making processes need 'shock value' as much
hen-houses need foxes.  ...

> since people seem to be thinking that we can release vast amounts of data
> and not worry about abuses at all. This attitude shocks me a little,
> since almost by definition this subject involves releasing even more
> data than usual, and we've already seen abuses of public data.

At the beginning of the thread the initial respondents appeared to be
under the mistaken impression that we were already liberally releasing
effectively identical information.

In later replies the tone has been more negative.. to the point where
I'm concerned that we may at risk of discarding the baby with the
bathwater.

> Not to mention that you *can't* trust researchers to keep it
> confidential, any more than you could anyone else.

Well, more than "anyone else" perhaps. Certainly it would be better to
give the data to 'researchers' than a malicious force, or to someone
completely unqualified to handle private data. ...  But at the same
time it would be better still to minimize disclosure.

> Every bit of data reduces privacy and anonymity; this is a fact of life

Technically true, but not useful.

> I assume everyone here is intelligent and

Then why resort to shock statements and over-generalizations?

[snip]
> The question here is not whether we can mangle the data so there is no
> danger of privacy violations. It exists, it will always exist. The
> question is, can we reduce that danger to below the average every-day
> risks
[snip]
> Right now, I'm not convinced it's worth it.
[snip]

I think you are creating a false choice here: The choice when dealing
with private data isn't only between "no release at all" and
"substantial risk but below the average every day risk".

Even while keeping the pedantic "Every bit of data reduces privacy and
anonymity" in mind, there are many types of data extract which pose an
exposure level so low that we can fairly classify it as none when
speaking English rather than pedantese:

For example, no one sane is going to claim that releasing the daily
viewership rates for existent articles with some quantization is going
to cause an measurable impact to anyone's privacy or anonymity.



More information about the foundation-l mailing list