[Wikitech-l] new servers and log processing

Sj 2.718281828 at gmail.com
Tue Apr 12 19:40:43 UTC 2005


I believe I mentioned this to one of these lists recently, but there
are two people at the National Bureau of Economic Research
(http://en.wikipedia.org/wiki/NBER) here in Cambridge, Jr.  who are
excited about helping produce more frequent and more comprehensive
Wikipedia statistics.  (they want to use them in academic studies; see
[[Wikipedia:Wikiproject Wikidemia]]).

Clearly, log files would have to be processed securely within the
server cluster, but some of the script writing and testing could
perhaps be offloaded to an NBER programmer (for instance, I believe
that certain key webalizer stats aren't accurate due to the current
setup, and have to cleverly handle requests from the fr: squids).

What older stats scripts, aside from webalizer, need to be rewritten? 
What scripts aren't run often because of the amount of time they take?
 Is there anything non-Foundation developers can do to speed up the
process of getting a dedicated machine to process logs?

SJ

On Apr 12, 2005 11:08 AM, Brian <reflection at gmail.com> wrote:
> I saw mention on our tentative hardware order [1] of disk space for log
> files, although not specifically the traffic logs. I believe Kate mentioned
> we generate around 2GB per day, which means we have around ~360GB of these
> sitting around waiting to be processed. Would it be possible to purchase or
> set aside a server or two to take care of this, with the (perhaps pipedream)
> goal of then running webalizer every day?
> 
> [1] http://meta.wikimedia.org/wiki/Hardware_ordered_March_2005
> 
> /Alterego
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l at wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>



More information about the Wikitech-l mailing list