On 12/6/18 9:16 PM, Andrew Bogott wrote:
I recently noticed that some of our standard kvm/nova
monitoring never
got copied over from the labvirt puppet code to the cloudvirt puppet
code. Tomorrow I will merge
https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.
Once that patch is merged, icinga will be a bit touchier on the
cloudvirts. In particular, it will alert for any cloudvirt that has 0
VMs running on it. (This turns out to be a useful thing to watch for
because we've had cases where every single kvm process died at once.)
So, all 'idle' cloudvirts should nonetheless have a canary instance. For
example, on the new analytics cloudvirts I created canaries like this:
$ OS_PROJECT_ID=testlabs openstack server create --image
7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic
net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone
host:cloudvirtan1004 canary-an1004-01
Once a virt host is in full service we can leave the canaries there or
delete them -- there hasn't been any real consistent policy there.
Thanks for the heads up and the example command.
I think it makes sense to have a canary per cloudvirt. It does mean they
are OSes that need to be updated and maybe ignored in metrics
collection, but the annoyance should be minimal. It would be good to
have a barebones OS image for them but I'd consider that a very low
priority.
--
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services