Re: [Multimedia] [Analytics] Media Viewer Dashboards - Multimedia

20 May 2014

(I removed analytics-l from CC as this is probably not of general interest
any more)

The load wasn't too much of a problem.  TokuDB is being used now and it's
supposed to be orders of magnitude better than InnoDB.

The reason we started sampling Friday was because there didn't seem to be a
need for all the data, and it had contributed to basically doubling
EventLogging's total stream of events.

However, Gilles proves the opposite in his message below.  So, we should
either stop sampling or change from 1/1000 to like 1/10 for all the
"action" events (see Gilles' message where he mentions
non-"action" events
were already sampled and are not affected).

On Tue, May 20, 2014 at 12:41 PM, Toby Negrin &lt;tnegrin(a)wikimedia.org&gt; wrote:

...
  If we skip the db and dump the data into hadoop it
could probably handle
 the load. No idea if this is a good idea right now. Just a thought.

 ---------- Forwarded message ----------
 From: Gilles Dubuc &lt;gilles(a)wikimedia.org&gt;
 Date: Tue, May 20, 2014 at 5:21 AM
 Subject: Re: [Analytics] [Multimedia] Media Viewer Dashboards
 To: Wikimedia Foundation Multimedia Team &lt;multimedia(a)lists.wikimedia.org&gt;
 Cc: Analytics Team List &lt;analytics(a)lists.wikimedia.org&gt;

 Media Viewer's usage of EventLogging grew considerably because of all the
 tracking we're doing:
 http://lists.wikimedia.org/pipermail/analytics/2014-May/002053.html and
 Nuria asked us to reduce the rate.

 Due to the global size we're dealing with, instead of logging every action
 on every site, we'll now have to measure a sample and extrapolate an
 estimate. As a quickfix last Friday Gergo introduced the sampling of
 actions (one every thousand actions instead of each action is now
 recorded). As a result all figures on the actions graph were divided by
 1000 overnight, making the line appear to go to 0. If you actually hover
 over recent days and look at the lest sidebar, you'll see that there are
 figures (they are kind of useless, though, more on that below).

 We're now working on improvements and fixing the graphs:
 https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/619The general gist
of it is that the figures will be compensated according to
 the sampling and that the sampling factor will be fine-tuned to only apply
 to metrics that were responsible for the high traffic.

 Unfortunately it looks like the 1:1000 sampling since last Friday was too
 extreme and is destructive of information, even for the actions that were
 the most numerous. We knew that such a high sampling factor was going to
 destroy information for small wikis or metrics with low figures, but even
 the huge metrics in the millions have become unreliable. I'm saying that
 because multiplying even the largest figures by 1000 still doesn't give an
 estimate close to what it was before the change. Which means that the
 actions graph probably won't be fixable for the period since last Friday
 until my fixes make it through. Even compensating for the sampling (by
 multiplying the figures by 1000), the line would jump up and down every day
 for each metric.

 Graphs other than actions are unaffected (they were already sampled). The
 duration log was also affected, but that one doesn't have graphs yet, as
 the task to create them has been given low priority in the cycle.

 On Mon, May 19, 2014 at 8:43 PM, Fabrice Florin &lt;fflorin(a)wikimedia.org&gt;wrote;wrote:

  Hi guys,

 Does anyone know why the Media Viewer metrics dashboards seem to be stuck
 with old data from Friday?

 http://multimedia-metrics.wmflabs.org/dashboards/mmv

 Is there anything we could fiddle with to get the new data to show up?

 Thanks for any insights :)

 Fabrice

   _______________________________

 Fabrice Florin
 Product Manager
 Wikimedia Foundation

 http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)

 _______________________________________________
 Multimedia mailing list
 Multimedia(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/multimedia

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics