[Foundation-l] should not web server logs (of requests) be published?

Andrew Gray andrew.gray at dunelm.org.uk
Mon Nov 29 19:16:17 UTC 2010


On 29 November 2010 10:11, Domas Mituzas <midom.lists at gmail.com> wrote:
>> The sampled 1/1000 squid logs can be used for statistical purposes, such as
>> page view stats.  Someone more techy can answer that better than I can, if
>> the samples include IP addresses that could be used w/ geoip for geographic
>> analysis. (I think perhaps not)
>
> we do aggregations on full sample, not 1/1000
> 1/1000 gets saved to a file for post-mortems and "wtf is going on" type of analysis.

Ah, that explains it - I was wondering how we could get something as
precise as "three views one day, five the next" out of a 1/1000
sample! So am I right in assuming that what happens is:

1) page request comes in and is served
2) every thousandth request is sent to a separate file and logged
3) the rest are stripped of all data bar "X page requested"
4) this is kept for the pageview statistics, which are very fine-grained

The end result: one file with 0.1% of requests logged in detail and
another file with "hit counts" and no more.

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk



More information about the foundation-l mailing list