On 05/07/06, Steve Bennett <stevage(a)gmail.com> wrote:
* why those are not kept
Last I heard, disk space. Logging each of ~11-12 thousand hits per
second => full disk.
* whether they could be turned on for brief periods
(like 24 hours) to
allow periodic data collection
Possible, see below...
* what alternative solutions might exist
It seems like there are at least three different places where log data
could be collected:
* on the mysql database - probably very "expensive"
Forget it.
* on mediawiki (ie, in php code) - probably much more
attractive,
could add tuning to only record every 10th or 100th hit or whatever
You can't "store" stuff in PHP, it would have to log to the file
system or elsewhere anyway.
* on the "squids" (presumably, proxy
servers) - no idea
Squid in this case refers to the Squid web caching software. "The
Squids" is our semi affectionate name for bundles of caching proxies
that stop millions of queries from killing the rest of our cluster.
Is there absolutely no way that data could be
collected at any of
these points, even for short periods, and even filtered?
It's been thrown about before a lot, and a lot of "perhaps" is said,
but not a lot of work is done. Periodic statistics collection could
mean the sample is not quite consistent, but...meh.
There are lots of people stating it can be done, but not a lot of them doing it.
Rob Church