Re: [Cloud-admin] [Cloud-announce] additional monitoring on cloudvirts -- don't run them empty! - Cloud-admin

7 Dec 2018

On 12/6/18 9:16 PM, Andrew Bogott wrote:
...
  I recently noticed that some of our standard kvm/nova
monitoring never 
 got copied over from the labvirt puppet code to the cloudvirt puppet 
 code.  Tomorrow I will merge 
 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.

 Once that patch is merged, icinga will be a bit touchier on the 
 cloudvirts.  In particular, it will alert for any cloudvirt that has 0 
 VMs running on it.  (This turns out to be a useful thing to watch for 
 because we've had cases where every single kvm process died at once.)

 So, all 'idle' cloudvirts should nonetheless have a canary instance. For 
 example, on the new analytics cloudvirts I created canaries like this:

 $ OS_PROJECT_ID=testlabs openstack server create --image 
 7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic 
 net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone 
 host:cloudvirtan1004 canary-an1004-01

 Once a virt host is in full service we can leave the canaries there or 
 delete them -- there hasn't been any real consistent policy there. 

Thanks for the heads up and the example command.

I think it makes sense to have a canary per cloudvirt. It does mean they 
are OSes that need to be updated and maybe ignored in metrics 
collection, but the annoyance should be minimal. It would be good to 
have a barebones OS image for them but I'd consider that a very low 
priority.

-- 
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services