Hi,
Ævar Arnfjörð Bjarmason wrote:
As my first toolserver project I've made some
statistics of the
activity in our CVS repository. I made aggregated statistics of the
whole repository[1] as well as statistics for individual modules[2].
You'll see per commiter statistics there, LOC statistics, file &
directory statistics and changelogs. Addidionally I've put the raw CVS
changelogs up there too[4], and finally the whole thing can be
downloaded[3] for local viewing & hacking, just make sure you're
*ahem* adhering to the license[5].
1.
http://tools.wikimedia.de/~avar/cvs/html/all/
2.
http://tools.wikimedia.de/~avar/cvs/html/
3.
http://tools.wikimedia.de/~avar/cvs/cvs.tar.gz
4.
http://tools.wikimedia.de/~avar/cvs/log/
5.
http://tools.wikimedia.de/~avar/cvs/COPYING
Thanks for this cvs analysis. The distribution of activity per author
is typical as well as other distributions. If you look around for
general tools and projects in quantitative measurement of open source
software you'll stumble upon the Libresoft group at the Universidad Rey
Juan Carlos, Madrid:
http://libresoft.urjc.es/
For instance have a look at:
http://libresoft.dat.escet.urjc.es/cvsanal/kde3-cvs/index.php?menu=Statisti…
Two weeks ago I've been contacted by Felipe Ortega and Jesus Gonzalez
Barahona (this mail is also forwarded to them) of this group. They would
like to do other analysis of Wikipedia. I'd like to hear your opinions
about this. Here two quotes of them describing what they want to do:
So, our proposal is that, if Wikipedia admins allow us
to participate,
we are very interested in the design and implementation of a log
analysis system for Wikipedia (both Squids and Apaches). As the
ammount of information that the system could generate over a
significative period of time (about 2 TBytes) is too close to our
hardware limits, we may propose to pick up randomly a representative
set of samples from Apaches and Squids logs (about 10.000 per hour).
...
Felipe is ready to help you to instrument the
Wikipedia squids and
apaches, in a way which (we hope) won't have impact on Wikipedia
reliability nor performance. However, maybe that instrumenting is not
necessary, depending on the log information available. Of course, the
work of anonymizing logs before analysis, etc. would be done by us.
And also of course, we would contribute back the results of the work,
hopefully with a system to measure the performance of the system,
which can help to identify bottlenecks and problems.
Maybe they can also collaborate at the toolserver (or at their own
machines - I don't know what's better). By the way at the 22C3 three
days ago somebody who seems to collect mainframes offered us (wikimedia)
a private computer center in Germany with 40-processor machines and a
100TB tape library - any interest? ;-) I'll post
the details later.
Greetings,
Jakob