Does this mean all of the data that was previously in the dashboards from April and May will now be permanently gone from the dashboards?

Dan

On 22 July 2015 at 14:13, Oliver Keyes <okeyes@wikimedia.org> wrote:
Hey all,

So, the data for the Search dashboards
(http://searchdata.wmflabs.org/metrics/) comes from a variety of
sources, one of which is the daily logs of all Cirrus search requests
- about 46GB of data a day. We set up a pipeline to this to report the
"zero" rate - how many queries happen with zero results. This was a
pretty shaky pipeline because it was an ultra-urgent,
need-it-for-a-presentation thing.

Good news: my prediction that it needed work was accurate. Bad news:
my prediction that it needed work was accurate ;).

When Erik and I went through all of the scripts and rewrote
them on the 15th we discovered a lot of maintenance tasks that were
being identified as searches. These are now being excluded, but we
have to backfill 1.5 months of data. I've chosen to eliminate the old
data and then backfill, because it means we avoid having data from
multiple, dissonant software versions, and because it just makes the
backfilling task a bit easier.

As a result, the dashboards may look a bit odd over the next couple of
days; they have data from the 15th onwards that we're comfortable
about, but are gradually backfilling from 1 June to 14 July - starting
on 1 June. So at the moment we have 1 June and 15-21 July. Weird. And
then 1, 2nd June, 15th...so on.

So expect to see increasingly less weird graphs, until the point where
they're back
to normal, (but more consistent and sane looking). Until then: yeah,
they're gonna look a bit weird.

Thanks,

--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search



--
Dan Garry
Lead Product Manager, Discovery
Wikimedia Foundation