Thanks for the dump of data Nuria. I assume these all
add up to 100% (roughly)
and are global?
Roughly and global. Yes to both. As I said, "very" preliminary data. Good
enough to triage bugs though.
do need it until we have some nice page I can go to to
retrieve it :).
Yes. I also will be happy to re-run it if needed be.
On Fri, Oct 17, 2014 at 11:30 AM, Jon Robson <jrobson(a)wikimedia.org> wrote:
On Wed, Oct 15, 2014 at 12:12 PM, Andrew Otto
<aotto(a)wikimedia.org> wrote:
Jon,
Recent unsampled webrequest logs are available for querying in Hive now!
https://wikitech.wikimedia.org/wiki/Analytics/Cluster
:)
If you don’t already have access for this, submit an RT request to get
access to
stat1002 and the analytics-privatedata-group.
That's good to know. Thanks. I'm not sure if I have stat1002 access
but every time you mention RT I shudder ;-)
Thanks for the dump of data Nuria. I assume these all add up to 100%
(roughly) and are global? So if I understand correctly, if I get the
above access and follow your instructions I can get this data when I
do need it until we have some nice page I can go to to retrieve it :).
This is good to know when we have these sort of questions so thanks a
bunch. We are currently interested in phablet traffic (big screen
mobile devices) so this should be useful information for us thanks!
On Thu, Oct 16, 2014 at 7:15 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
And I have
no idea what our traffic for
Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
our traffic.
So the answer to this question (with preliminary data) is that neither
2.1
> nor 2.2 amount to 0.05% of traffic to the mobile site.
> I have attached the list of user
agents and devices (with percentages)
for
the last 30 days. I did not included any
device/browser combo with less
than
> 0.05% of traffic.
> For about 4% of traffic we could not
identify the browser, this might be
> cause the user agent was not there or because ua-parser could not figure
it
out, I understand this is not ideal but I am
sending this cause I feel
this
> list provides quite a bit of value and should help you triage bugs.
> iOS takes the cake which does not
cease to amaze me.
> I described what I did to gather the
data here (anyone with permits to
1002
> can repro):
>
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive/QueryUsingUDF
> On Wed, Oct 15, 2014 at 12:15 PM, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
>
> >And I have no idea what our traffic for
> >Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
> >our traffic.
> Understood, it is hard for you guys to work without knowing this data. I
> will try to get a user agent list for data from last month but, as I
> mentioned earlier, I think providing this data in a regular basis
(monthly?)
> is a good goal for us.
>
> On Wed, Oct 15, 2014 at 10:35 AM, Jon Robson <jrobson(a)wikimedia.org>
> wrote:
>>
>> Anything would be useful. I just hit this situation again. I was
>> reviewing some code and someone used JSON.stringify - this is not
>> available in Android < 2.3 and I have no idea what our traffic for
>> Android 2.1 and 2.2 is and if it is significant e.g. more than 1% of
>> our traffic.
>>
>> In the mean time while I don't have a fancy place to find out the
>> answers to this how can I get these answers?
>> Should I mail the analytics mailing list to ask these questions? Cc a
>> point person on bugzilla with the question? Ping someone privately?
>>
>> Jon
>>
>>
>>
>> On Tue, Oct 14, 2014 at 10:30 AM, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
>> >>Woah! Nice :D How are definitions
updates handled?
>> > Since we talked about this on IRC, restating here to keep the
archives
>> > happy.
>> > We pull the ua parser jar from our archiva depot, an update will
>> > involve
>> > building a new jar, uploading it to archiva and updating our
dependency
>>> > file
>>> > (pom.xml) to point to the newly updated version.
>>>
>>>
>>>
>>> > On Fri, Oct 10, 2014 at 9:59 PM, Oliver Keyes
<okeyes(a)wikimedia.org>
>>> > wrote:
>>> >>
>>> >> Woah! Nice :D How are definitions updates handled?
>>> >>
>>> >> On 10 October 2014 18:58, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
>>> >>>
>>> >>> >1. A UDF for ua-parser or whatever we decide to use (this
will
>>> >>> > possibly
>>> >>> > be necessary for pageviews, but not necessarily - it
depends on
our
>> >>> > >spider/automaton
detection strategy)
>> >>> We got this one ready today:
>> >>>
https://gerrit.wikimedia.org/r/#/c/166142/
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Oct 10, 2014 at 3:55 PM, Oliver Keyes <
okeyes(a)wikimedia.org>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 10 October 2014 16:02, Nuria Ruiz <nuria(a)wikimedia.org>
wrote:
>> >>>>>
>> >>>>> >At some point I believe we hope to just, you know. Have
a
>> >>>>> > regularly
>> >>>>> > updated browser matrix somewhere.
>> >>>>> I REALLY think this should make it into our goals, if it
cannot
be
>> >>>>> done
>> >>>>> this quarter it should for sure be done this quarter.
>> >>>>>
>> >>>>
>> >>>> I agree it would be nice. It's one of those things that will
either
>> >>>> come
>> >>>> as a side-effect of other stuff, OR require subsantially more
work,
>> >>>> and
>> >>>> nothing in-between. Things we need for it:
>> >>>>
>> >>>> 1. A UDF for ua-parser or whatever we decide to use (this will
>> >>>> possibly
>> >>>> be necessary for pageviews, but not necessarily - it depends on
our
>> >>>> spider/automaton
detection strategy)
>> >>>> 2. Pageviews data
>> >>>> 3. A table somewhere.
>> >>>>
>> >>>> Take 1, apply to 2, stick in 3. Maybe grab the same data for
>> >>>> text/html
>> >>>> requests overall (depends on query runtime), maybe don't.
>> >>>>
>> >>>> The ideal implementation, obviously, is to pair this up with a
site
>> >>>> that
>> >>>> automatically parses the results into HTML. That should be the
end
>> >>>> goal. but
>> >>>> in terms of engineering support we can get most of the way
there
>> >>>> simply by
>> >>>> ensuring we always have a recent snapshot to hand. I can
probably
>> >>>> put
>> >>>> something together over the sampled logs and throw it in SQL if
>> >>>> there are
>> >>>> urgent needs.
>> >>>>
>> >>>>>
>> >>>>> Do we not have more recent data than May?
>> >>>>
>> >>>>
>> >>>> We don't, but thanks to the utilities library I built, the
code
for
>> >>>> generating it would
literally run:
>> >>>>
>> >>>> library(WMUtils)
>> >>>> uas <-
>> >>>>
>> >>>>
as.data.table(ua_parse(data_sieve(do.call("rbind",lapply(seq(20140901,20140930,1),sampled_logs)))$user_agent))
>> >>>>
>> >>>> uas <- uas[,j = list(requests = .N, by =
c("os","browser")]
>> >>>>
>> >>>> write.table(uas, file = uas_for_jon.tsv, sep = "\t",
row.names =
>> >>>> FALSE,
>> >>>> quote = TRUE)
>> >>>>
>> >>>> ...assuming we didn't care about readability.
>> >>>>
>> >>>> Point is, in the time until we have the new parser built into
Hadoop
>> >>>> and
>> >>>> that setup, we can totally generate interim data from the
sampled
>> >>>> logs using
>> >>>> the same parser at a tiny cost in research/programming time,
iff
>> >>>> (the
>> >>>> mathematical if) we need it enough that we're cool with the
>> >>>> sampling, and
>> >>>> people can convince [[Dario|Our Great Leader]] to authorise me
to
>> >>>> spend 15
>> >>>> minutes of my time on it.
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Oct 10, 2014 at 12:45 PM, Oliver Keyes
>> >>>>> <okeyes(a)wikimedia.org>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Email Dario and I, if he prioritises it I'll run a
check on more
>> >>>>>> recent data.
>> >>>>>>
>> >>>>>> At some point I believe we hope to just, you know. Have
a
>> >>>>>> regularly
>> >>>>>> updated browser matrix somewhere. This comes some time
after
>> >>>>>> pageviews
>> >>>>>> though.
>> >>>>>>
>> >>>>>> On 10 October 2014 14:38, Toby Negrin
<tnegrin(a)wikimedia.org>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Hi Jon -- I'm sure other folks will have more
information but
>> >>>>>>> here's
>> >>>>>>> a link to a slide with some data from May[1]. We
don't see a
lot
>> >>>>>>> of Windows
>> >>>>>>> phone traffic.
>> >>>>>>>
>> >>>>>>> -Toby
>> >>>>>>>
>> >>>>>>> [1]
>> >>>>>>>
>> >>>>>>>
https://docs.google.com/a/wikimedia.org/presentation/d/19tZgTi6VUG04wfGWVzc…
>> >>>>>>>
>> >>>>>>> On Fri, Oct 10, 2014 at 11:17 AM, Jon Robson
>> >>>>>>> <jrobson(a)wikimedia.org>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> I was going through our backlog again today, and
I noticed a
bug
>> >>>>>>>> about
>> >>>>>>>> supporting editing on Windows Phones with IE9
[1]
>> >>>>>>>>
>> >>>>>>>> Yet again, I wondered 'how many of our users
are using IE9'
as I
>> >>>>>>>> wondered
if because of this lack of support we are losing out
on
>>> >>>>>>>> lots
>>> >>>>>>>> of potential editors.
>>> >>>>>>>>
>>> >>>>>>>> What's the easiest way to get this
information now? Is it
>>> >>>>>>>> available?
>>> >>>>>>>>
>>> >>>>>>>> [1]
https://bugzilla.wikimedia.org/show_bug.cgi?id=55599
>>> >>>>>>>>
>>> >>>>>>>>
_______________________________________________
>>> >>>>>>>> Analytics mailing list
>>> >>>>>>>> Analytics(a)lists.wikimedia.org
>>> >>>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Oliver Keyes
>>> >>>>>> Research Analyst
>>> >>>>>> Wikimedia Foundation
>>> >>>>>>
>>> >>>>>> _______________________________________________
>>> >>>>>> Analytics mailing list
>>> >>>>>> Analytics(a)lists.wikimedia.org
>>> >>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Oliver Keyes
>>> >>>> Research Analyst
>>> >>>> Wikimedia Foundation
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Oliver Keyes
>>> >> Research Analyst
>>> >> Wikimedia Foundation
>>>
>>>
>>
>>