Hi Valerio,
Mako was referring to
https://dumps.wikimedia.org/other/pagecounts-raw/ and
the current logging practices. My understanding is also that these things
are not logged on a routine basis. The Wikibench traces seem to have been a
special case.
I've also contacted the researchers who partially released it, but making
it publicly available is tricky for them, due to its
size (12 TB), which
might instead be somehow in the norms of the operations taken daily by
Wikipedia servers.
Have the researchers looked into requester-pays data storage on Amazon or
another provider? They should be able to make it public with no resources
and at no cost to themselves whatever the size.
Cheers,
Scott
On Wed, Sep 24, 2014 at 7:09 PM, Valerio Schiavoni <
valerio.schiavoni(a)gmail.com> wrote:
Hello Mako,
On Wed, Sep 24, 2014 at 8:13 AM, Benj. Mako Hill <mako(a)atdot.cc> wrote:
Users
mostly read the most recent version of a given page, but from
time to
time, read accesses to the 'history' of a
page happens.
At least as far as know, views to historical versions of webpages in
Wikipedia don't show up in the access logs at all because certain
kinds of requests (like requests to /w/index.php?oldid=NUMBER) don't
get recorded in the pageview data.
I'm sorry to contradict you, but at least on the Wikibench traces, that
information is very well present. I see things like:
1609418296 1190438479.078
http://en.wikipedia.org/w/index.php?title=Western_betrayal&oldid=982812…
That is, back in 2007, users were accessing a version of that page that
dated back in 2005 or so.
New versions of a page are created as well.
Finally, users might
potentially need to explore several old versions
of a given web
page, for example by accessing the details of its history[1].
AFAIK, viewing the history page itself is also not recorded in the
page view data either.
Sorry to contradict you again, but there are indeed logs for that as well:
http://en.wikipedia.org/w/index.php?title=Marina_Nadiradze&action=histo…
I'm quite surprised that such informations are not known by the community
of Wikipedia researchers.
Best,
Valerio
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Scott Hale
Oxford Internet Institute
University of Oxford
http://www.scotthale.net/
scott.hale(a)oii.ox.ac.uk