The forwarded message is very relevant for wikimetrics. I have some basic backups but we really should migrate the service to production. There are a few things to fix before we do so, but if we don't want any interruptions or data loss, this *has* to happen.
—
Sent from Mailbox for iPhone
---------- Forwarded message ----------
From: "Andrew Bogott" <abogott(a)wikimedia.org>
Date: Fri, Nov 15, 2013 at 2:29 PM
Subject: [Wikitech-l] Labs datacenter migration
To: "A list for announcements and discussion related to the Wikimedia Labs project." <labs-l(a)lists.wikimedia.org>, "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>
> Almost a year ago, the Wikimedia Foundation migrated most of our
> services from our old data center in Tampa to the new one in Ashburn
> [1]. In the next couple of months Labs and Tool Labs will be following
> suit -- we expect to have everything moved to Ashburn by mid-January at
> the latest.
> This move will provide some immediate benefits (lower latency with
> production, quicker database replication) and many long-term benefits
> (better stability, happier Operations staff). We don't yet have a
> specific timeline for stages of the migration, but there are a few
> things you can do now to help us prepare for the change and to bolster
> your projects against possible disruption.
> 1) Subscribe to Labs-l, and read it. [2] Labs-l is low-volume, and
> future migration announcements may not be sent to other lists.
> 2) Tool Labs users: As long as your tools are properly managed by the
> grid engine and can survive stops and restarts, the migration will be
> quite painless. If your tools aren't, or can't... fix them :)
> 3) Labs project admins: Clean up old projects and instances. If you
> have instances that are no longer of interest, delete them. If you know
> of entire projects that are no longer in use, please contact me directly
> and I'll mop up.
> 4) Labs instance owners: Make sure that puppet is running properly on
> your instances. If '$sudo puppetd -tv' produces any red lines, then fix
> them or contact me for help with fixing. When instances move to the new
> data center we'll be relying on puppet to update location-specific
> settings, so instances without puppet may not survive the move. If your
> instance uses self-hosting puppet (via puppetmaster::self or
> role::puppet::self) then you will also need to update your local puppet
> repo. [3]
> 5) All Labs users: if you have valuable data residing on local instance
> storage, start backing it up to shared storage in /data/project. You
> should be doing this anyway -- no instance is safe from catastrophe, ever.
> 6) If your project or tool generates log files, have a look at purging
> old log data. The last time we did a data migration there was at least
> one terabyte-sized logfile that really gummed up the works.
> Updates about this change will be posted to this list as soon as we
> know about them. Any potential downtime will be announced well in
> advance. In the meantime, don't hesitate to ask questions about the
> above steps on IRC or the mailing list.
> -Andrew
> [1]
> https://blog.wikimedia.org/2013/01/19/wikimedia-sites-move-to-primary-data-…
> [2] https://lists.wikimedia.org/mailman/listinfo/labs-l
> [3] https://wikitech.wikimedia.org/wiki/Help:Self-hosted_puppetmaster#FAQ
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
I just noticed someone ran a query from 2012 to 2013 as a timeseries by
hour. This... creates a *lot* of data. For the cohort they used, it's
about 1.8 million pieces of data. Should we cap report sizes somehow? It
doesn't pose any immediate dangers other than taking up a lot of resources
and computation time, as well as IO time spent logging the results (the log
is currently acting as rudimentary backup - perhaps this is ill conceived).
In this case it looks like maybe it was a mistake, so one idea is to warn
the user that they are about to generate a lot of data, and to ask them to
confirm.
Thoughts?
Dan
Hey all,
I used the threshold metric for the first time yesterday. First off, thanks
for adding it! Dario tells me it was brand new as of yesterday? He also
said it needs vetting?
One piece of feedback: combining threshold and 'time to threshold' seems to
make things more confusing. For example, when you select sum as an output,
you also get the sum of the time to threshold. That result -- like
"time_to_threshold":
92.7864 -- seems to be simply the sum of hours for the members of the
cohort. Knowing that it took the cohort a combined 92 hours to reach the
threshold isn't very actionable.
--
Steven Walling,
Product Manager
https://wikimediafoundation.org/
Hi,
I had to reboot the redis server that stores all of your report results
today. This uncovered the very unpleasant truth that redis was never
backing up properly. Unfortunately, any report results that you had saved
are now lost. They can be rerun of course, but that gets me to the second
piece of bad news.
Even more frustratingly, I can't seem to get redis to come back up again.
We are working on this problem and will try to fix it ASAP. I will post
here once it's fixed.
Dan
Hi,
Just a quick heads up, I've changed the way I report bytes added very
slightly. Instead of returning "null" when a user has no bytes added, I'm
now returning 0. This is to allow for more meaningful null values, like in
the case of time_to_threshold. So from now on, if you see a null value in
a metric report, it means that value will not be counted towards any
aggregate.
Hopes that makes things a bit clearer and here's the patchset if you're
curious:
https://gerrit.wikimedia.org/r/#/c/91495/
We'll merge/deploy this shortly.
Dan
Heya,
Today we enabled HTTPS by default for Wikimetrics which means that all
communication between your browser and our server is encrypted. This means
more privacy and security!
Please let us know if it caused any issues and we'll fix them :)
Best,
D