Hi all,
Some analytics researchers have been wanting an upgrade on stat1003 for a while now. See:
https://rt.wikimedia.org/Ticket/Display.html?id=8057
Unless someone objects, I plan to start this tomorrow. This has some risk associated with it, as we usually prefer to do reinstalls rather than distribution upgrades. I will do what I can to mitigate the risk.
-Andrew Otto
Hey folks,
We've been discussing ways to make more Wikimedia data public. One of our
sources for data is EventLogging (EL)[1], a system that lets us track
events on both the client and server-side. Recently, YuviPanda and
springle have been working with us to figure out what issues need to be
resolved in order to begin loading EL events that contain public data[2]
into LabsDB for public consumption and for use in WikiMetrics.
It looks like there are three major concerns about directing EL to LabsDB.
(1) there needs to be a good review process in place to make sure that the
data we surface isn't sensitive, (2)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be
addressed to make sure that we don't over-utilize labs infrastructure and
(3) we'll need signoff from legal.
It looks like (2) can be taken care of independently from (1) and (3). Is
this bug already prioritized, and if not, could it be?
1. https://www.mediawiki.org/wiki/Extension:EventLogging
2. Eventually, we'll want a means to sanitize and surface events that
contain sensitive information, but I'd argue that is a second step that we
should address later since it will likely require more substantial
technical work.
-Aaron
Hi,
following up on [1] we'll soonish [2] purge the following tables from
the EventLogging databases:
SignupExpAccountCreationComplete_8539421
SignupExpAccountCreationImpression_8539445
SignupExpCTAButtonClick_8102619
SignupExpCTAButtonClick_8965028
SignupExpCTAImpression_8101716
SignupExpCTAImpression_8965023
SignupExpPageLinkClick_8101692
SignupExpPageLinkClick_8965014
TrackedPageContentSaveComplete_7872558
TrackedPageContentSaveComplete_8535426
If you rely on any of them, please let us know by 2014-08-11.
Sorry for the short notice,
Christian
P.S.: The tables will likely pop up again. So do not be alarmed, if
you see them appearing again. This is expected. At this point we only
care about removing old data from them.
[1] http://lists.wikimedia.org/pipermail/analytics/2014-July/002351.html
[2] Sorry for being vague here. There are still a few moving parts,
but it seems we're purging the data rather sooner than later.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Greetings.
A day or two ago, we released an update which changes the user agent of the
app on iOS to improve standardisation across iOS and Android apps.
Here are two example user agents:
- WikipediaApp/4.0.1 (iPhone OS 7.1.2; Tablet)
- WikipediaApp/2.0-r-2014-07-23 (Android 4.1.2; Phone)
as opposed to the old iOS user agent, which used to look like this:
- Wikipedia/4.0 CFNetwork/672.1.15 Darwin/14.0.0
Moving forwards, this means "userAgent LIKE 'WikipediaApp%'" will now find
results from *both* platforms, not just Android.
Remember that being an app means that those iOS users who haven't updated
their app still have the old user agent, and will continue to have the old
user agent until they update their app. Bear this in mind when writing
queries against the app data!
Thanks,
Dan
--
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation
Hello Aaron,
I just discussed with you my issue and as proposed by
you, I send this e-mail in order to summarize once again what I am asking for:
I am producing a visualization (with Processing) about
data of the English Wikipedia. In particular, I want to produce a world map
that shows from which places people edit Wikipedia articles. I can get the
location of anonymous users by their IP-address, however, for privacy reasons,
I don’t get the IP-address of registered users (the API-query ‘list=checkuser’
requires special rights that I don’t have).
So, I would like to ask whether you can give me a set containing
all IP-addresses of registered users of the English Wikipedia. Important: I am
only interested in the IP-addresses - I DON’T need to know how many times registered
users edited with their IP-address and I DON’T need to know their usernames.
If you wish, I can ask my professor to send you a
confirmation e-mail that I am writing my Master Thesis about Wikipedia.
I am looking forward to your answer and thank you a
lot for your help!
Thomas Legler
Thanks Nuria. I will write a similar pilot research project proposal with
concrete parameters and send it over to the analytics(a)list.wikimedia.org
for further review.
2014-08-05 8:49 GMT+01:00 Nuria Ruiz <nuria(a)wikimedia.org>:
> I know this answer comes late as I was on vacation, sorry about that.
>
> At this time the cluster is not ready to be accessed by users not in the
> analytics team as things are still WIP. Now, in order to get the data you
> rae interested in you can always ask the research team to retrieve it for
> you (this is what we did for our pilot, actually).
>
> Please e-mail: analytics(a)lists.wikimedia.org and let us know what you are
> interested in.
>
>
> On Wed, Jul 30, 2014 at 8:40 PM, Pine W <wiki.pine(a)gmail.com> wrote:
>
>> Nuria and Andrew,
>>
>> Forwarding a question from Han-teng below.
>>
>> Pine
>> Dear Pine,
>>
>> A humorous touch here in your most recent email: "*A $1 fine will be
>> imposed by Oliver Keyes on anyone who misspells Leila's name or misdirects
>> emails to the WMF Executive Director."
>>
>> I have one slightly more serious question, on the possibility to use
>> the analytics infrastructure for the upcoming Hackathon.
>>
>> My Hackathon wish is to duplicate and reapply what Nuria Ruiz and
>> Andrew Otto has done for NARA analytics pilot.
>> https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_an…
>>
>> So to your knowledge, is it feasible to do so, in terms of (a) setting
>> up basic access for other users to duplicate the pilot, (b) getting some
>> help from Ruiz and/or Otto, and (c) setting up for other GLAM institution
>> that is not NARA.
>>
>> Feel free to forward this email to Nuria Ruiz and/or Andrew Otto
>> because I do not have their contacts.
>>
>> Best,
>>
>> --
>> han-teng liao
>>
>> "[O]nce the Imperial Institute of France and the Royal Society of London
>> begin to work together on a new encyclopaedia, it will take less than a
>> year to achieve a lasting peace between France and England." - Henri
>> Saint-Simon (1810)
>>
>> "A common ideology based on this Permanent World Encyclopaedia is a
>> possible means, to some it seems the only means, of dissolving human
>> conflict into unity." - H.G. Wells (1937)
>>
>>
>>
>>
>>
>> 2014-07-18 8:28 GMT+01:00 Pine W <wiki.pine(a)gmail.com>:
>>
>>> Thanks for this. Forwarding to Analytics and Research for others who are
>>> curious.
>>>
>>> Pine
>>>
>>>
>>> On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand(a)wikimedia.org>
>>> wrote:
>>>
>>>> This Tech Talk will be starting in 30 minuets. Thanks!
>>>>
>>>>
>>>> On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand(a)wikimedia.org
>>>> >
>>>> wrote:
>>>>
>>>> > Hello!
>>>> >
>>>> > Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at
>>>> 10am SF
>>>> > time/5pm UTC
>>>> > <
>>>> http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Tal…
>>>> >
>>>> > for a 30 min tech talk. You can join our hangout or follow along on
>>>> > youtube:
>>>> >
>>>> https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd0…
>>>> > (please note that a link to join the hangout will be posted in the
>>>> comments
>>>> > of this event just as it starts).
>>>> >
>>>> > You can follow ask questions on IRC during the talk in #wikimedia-dev.
>>>> >
>>>> > If you are not able to follow along live, a video recording will be
>>>> posted
>>>> > here
>>>> > <
>>>> https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/v…
>>>> >,
>>>> > to the MediaWiki YouTube channel immediately following the tech talk
>>>> for
>>>> > you to view at any time.
>>>> >
>>>> > More information about the tech talk:
>>>> >
>>>> > *Hadoop and Beyond. An overview of Analytics infrastructure*In this
>>>> tech
>>>> > talk we will be presenting the analytics infrastructure that we have
>>>> > recently rolled out in production. By now probably everybody knows
>>>> that
>>>> > wikimedia hosts an instance of hadoop from which we are going to
>>>> extract
>>>> > pageview data in the near future. But .. how exactly does the data get
>>>> > there?
>>>> >
>>>> > We will go over the path that webrequest log data takes from varnish
>>>> to
>>>> > kafka (a distributed log buffer) to hadoop and the challenges of
>>>> deploying
>>>> > this java-based infrastructure in production. We will also talk about
>>>> how
>>>> > can we query the data with hive, an SQL-like interface. How can you
>>>> set up
>>>> > this stack on vagrant to play with and, last but not least, how we
>>>> used
>>>> > hive recently to provide GLAM folks with image view stats:
>>>> >
>>>> https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_an…
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> Wikitech-l(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>>
>>
>>
>