Hi Melody,
I'm cc-ing our public list which is the best place to ask questions like
this.
So, the unique devices and pageviews numbers on the vital signs dashboard
are fetched from our public APIs. They don't allow bulk download. To get
bulk numbers, you can:
* download everything from our public dumps [1] where you'll find pageviews
by article or by project [2] and unique device numbers by project [3]
* ask one of our analysts to crunch numbers (in this case, the reading team
would be the relevant one to ask)
* use our internal cluster to crunch numbers yourself (I can help show you
around)
One important thing to keep in mind: You can aggregate pageviews by
language or project, but you can't do that for unique devices. Because the
same devices might be used to visit many sites and there's no way to
deduplicate that. We're working on counting global unique devices so we
have those numbers as well, though Tilman from reading has some interesting
work on that too.
[1] https://dumps.wikimedia.org/other/analytics/
[2] https://dumps.wikimedia.org/other/pageviews/
[3] https://dumps.wikimedia.org/other/unique_devices/
On Thu, Nov 17, 2016 at 11:44 AM, Melody Kramer <mkramer(a)wikimedia.org>
wrote:
> Hey Dan and Mikhail,
>
> I'm working on a map of the Wikimedia universe that will show the relative
> size of entities under the Wikimedia umbrella (Wikipedia, Wikibooks,
> Wikinews, etc.) grouped by language, articles contributed and then
> pageviews and/or unique devices.
>
> On this site: https://analytics.wikimedia.org/dashboards/
> vital-signs/#projects=eswiki,itwiki,enwiki,jawiki,dewiki,
> ruwiki,frwiki,enwikibooks,enwikinews,wikidatawiki,commonswiki/metrics=
> UniqueDevices I'm able to manually enter each language/wikiproject to see
> them all on the graph.
>
> Is there a way to acquire everything at once, and download it into a csv?
> Or say "Show all?"
>
> I'm happy to say more! Thanks so much for your help/expertise in this area
> in advance (and if there's someone else I should reach out to, please let
> me know who that might be!)
>
> Mel
>
>
> --
> Melody Kramer
> Read a random featured article from Wikipedia!
> <https://en.wikipedia.org/wiki/Special:RandomInCategory/Featured_articles>
>
> mkramer(a)wikimedia.org
>
>
Hi Dehaya,
> If we were to make the legend Interactive and the world map dynamic, we can
> improve legibility.
> We should making all the values (1 GJ, 10GJ etc) in the legend as clickable
> buttons.
> On clicking say 10kJ the World Map should show Boloid Events of 10GJ
> magnitude and remove the rest. This will make it easier to answer my
> earlier question.
>
As far as I am concerned, the chart extension is build on vega, which does
support interactive behavior: https://github.com/vega/vega/wiki/Signals
However, such is hard(er) to define than the mapping to graphics, in
particular if you want the kind of cross filtering style behavior you refer
to.
Cheers,
Jan
--
Jan Dittrich
UX Design/ User Research
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de
Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
[Apologies for cross-posting]
Hi everyone,
Almost a year ago, we [1] embarked on a research project to understand who
Wikipedia readers are. More specifically, we set a goal for finding a
taxonomy of Wikipedia readers. In the upcoming Research Showcase, I will
present the findings of this research.
*Logistics*
The Research Showcase will be live-streamed on Wednesday, November 16, 2016
at 11:35 (PST) 19:35 (UTC).
YouTube stream: https://www.youtube.com/watch?v=O24F1xkbNwI
As usual, you can join the conversation on IRC freedone at
#wikimedia-research. And, you can watch our past research showcases at
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase.
*Title*
Why We Read Wikipedia
*Abstract*
Every day, millions of readers come to Wikipedia to satisfy a broad range
of information needs, however, little is known about what these needs are.
In this presentation, I share the result of a research that sets to help us
understand Wikipedia readers better. Based on an initial user study on
English, Persian, and Spanish Wikipedia, we build a taxonomy of Wikipedia
use-cases along several dimensions, capturing users’ motivations to visit
Wikipedia, the depth of knowledge they are seeking, and their knowledge of
the topic of interest prior to visiting Wikipedia. Then, we quantify the
prevalence of these use-cases via a large-scale user survey conducted on
English Wikipedia. Our analyses highlight the variety of factors driving
users to Wikipedia, such as current events, media coverage of a topic,
personal curiosity, work or school assignments, or boredom. Finally, we
match survey responses to the respondents’ digital traces in Wikipedia’s
server logs, enabling the discovery of behavioral patterns associated with
specific use-cases. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and have potential implications for
developers aiming to improve Wikipedia’s user experience, editors striving
to cater to (a subset of) their readers’ needs, third-party services (such
as search engines) providing access to Wikipedia content, and researchers
aiming to build tools such as article recommendation engines.
*How to prepare? What to expect?*
If you decide to attend, here are a few things I would like to ask you to
keep in mind, especially if this will be your first time to one of our
research showcases:
* Like many other research projects in fields that are not heavily
explored, the findings of this research will create more questions than
they answer. I encourage you to keep these questions in mind throughout the
presentation and discussion: "What can we do with this finding? What other
questions can we ask? What other ideas can we try?"
* Be open to ask these questions to yourself, especially if you are a
Wikipedia editor, even before coming to the showcase: "Why do I edit
Wikipedia? Who am I writing the content for, if anyone? Will I change the
way I write content if I know more about who reads it (to encourage or
discourage certain types of reading or readers)? What needs an encyclopedia
should serve? What is Wikipedia: A place one can quickly find the answer to
his/her questions, or a place that one can go to when he/she wants to spend
a quiet time reading and learning, or a place for both and even more? etc."
* And, see if you would be interested to see the result of this study in
your language. What will be presented is based on research on English,
Persian, and Spanish Wikipedia (the data from the latter two projects have
been used only for one part of the research). We are interested in running
the study on at least 2-3 more languages to understand the robustness of
some of the results across different languages, and to also help
communities with having access to the results for their specific language
project.
Looking forward to seeing you there, and if you can't make it, please feel
free to watch the video later and get in touch with us with
questions/comments. :)
Best,
Leila
--
Leila Zia
Senior Research Scientist
Wikimedia Foundation
[1] WMF Research and researchers from three academic institutions: EPFL,
GESIS, and Stanford University, in collaboration with WMF Reading.
Dear Analytics team,
The general legibility of Charts in wikipedia are relatively poor.
We can improve it with making them more interactive and dynamic.
Please refer to the Chart in the attachment (Boloid Events.jpg).
The chart represents the distribution of Bolide events from 1994-2013 on
the world map.
The legend describe the magnitude of each event in Joules.
>From the chart can you count the number of 10GJ Bolide events in Africa?
You can count, but we take an awfully long time to find the answer.
If we were to make the legend Interactive and the world map dynamic, we can
improve legibility.
We should making all the values (1 GJ, 10GJ etc) in the legend as clickable
buttons.
On clicking say 10kJ the World Map should show Boloid Events of 10GJ
magnitude and remove the rest. This will make it easier to answer my
earlier question.
Regards
Dhaya
Hi!
I'm a Ph.D. student in economics, using some of the Wikimedia data in my research. My question is whether it's possible to get the data on Wikipedia pageviews by country and article category? Currently the Wikimedia Foundation provides the aggregate data on pageviews by country and the less aggregate data on pageviews by article, but it looks that there is no way to find out, for example, the pageviews of math articles in India.
More specifically, my questions are:
1) If is it possible in some way to extract the information on pageviews by country and subject area from your publicly available data? The amount of data currently available is already vast, and I could miss it.
2) If it is not possible, then how can I persuade you into making this data available? I'm going to argue that the data can be made available without losing confidentiality by using either first IP numbers or by publishing only the country of the user, as well as aggregating by the category.
I'm looking forward to hear from you. I'm sure that many social scientists will be also glad to use the opportunity to produce more interesting and policy-relevant research.
Best regards,
Alexander Ugarov,
Ph.D. Candidate
Sam M. Walton College of Business
Department of Economics
University of Arkansas
Are there any reasons to not replace HTTP GET request IP addresses and
proxy information with their SHA-512 secure hash prior to writing them
to permanent media?
Hi,
With Ori not responsible for statsv maintenance in an official capacity,
should the Analytics team handle statsv maintenance going forward?
Ori has tried to leave it in a state that doesn't need much maintenance
(read: restarts in case of issues) and is still trying to make it do so
<https://gerrit.wikimedia.org/r/#/c/321230/2>. Which means that it
shouldn't require that much actual work other than keeping an eye on it and
kicking it if the things Ori has been trying to put in place don't work.
Given that we're not the only ones using statsv and considering its
function, analytics seems like the home it should get. What do you think?
On Tuesday Nov 13, at 9 am UTC, the web server for the dumps and other
datasets will
be unavailable due to maintenance. This should take no longer than 10
minutes. Thanks for your understanding.
Ariel
Dear all,
The Wikimedia Foundation datasets collection on the Internet Archive
[1] has now surpassed 1 million items (and about 50,000 full database
dumps)! This marks a major milestone in our archiving efforts of
Wikimedia's vast amount of data and ensures that the vital content
submitted by volunteers across the moment is preserved. All these
would not have been possible without the help of many people,
including Nemo, Ariel and Emijrp (thanks!).
We started archiving towards the end of 2011 and reached a milestone
of half a million items back in June 2015. [2] We have since moved on
from archiving just the main database dumps to saving research-worthy
data such as the pageviews data and even attempting to keep a copy of
Wikimedia Commons. Today, we are working on making the items on the
Internet Archive more accessible for researchers by working on an
interface for searching old dumps.
Despite this feat, we are in constant need of more help. If you are a
researcher, a programmer or someone with a computer, we need your help
in many tasks! Have a look at WikiTeam's project [3] or Emijrp's
Wikipedia Archive page [4] for more information. If you regularly work
on the Wikimedia database dumps, please provide your input in the
Dumps-Rewrite project [5] and the API interface [6].
As before, here's to the next million!
[1]: https://archive.org/details/wikimediadownloads
[2]: https://groups.google.com/forum/#!msg/wikiteam-discuss/Vj3oonpYphg/h9HE6r3v…
[3]: https://github.com/WikiTeam/wikiteam
[4]: https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
[5]: https://phabricator.wikimedia.org/tag/dumps-rewrite/
[6]: https://phabricator.wikimedia.org/T147177
--
Hydriz Scholz