[Labs-admin] ** RECOVERY alert - ToolLabs/ToolLabs Home Page is OK **

Andrew Bogott abogott at wikimedia.org
Sun Nov 27 07:11:12 UTC 2016


I poked at this, but I'm pretty sure it recovered on its own.  When I 
tried to restart the service by hand, I got this:

Traceback (most recent call last):
   File "/usr/bin/webservice-runner", line 27, in <module>
     webservice.run(port)
   File 
"/usr/lib/python2.7/dist-packages/toollabs/webservice/services/lighttpdwebservice.py", 
line 108, in run
     with open(config_path, 'w') as f:
IOError: [Errno 13] Permission denied: '/var/run/lighttpd/admin'

It's late and this is half-baked (and my attempts to fix the problem 
destroyed the evidence) but my speculation is that in some situations we 
are 'leaking' read-only /var/run/lighttpd/admin files.  Once one of them 
is out there, each time the webservice restarts it's just the luck of 
the draw whether we hit an exec host that has or doesn't have a 
read-only file, so the failure is intermittent.

For now, I've explicitly removed that file on all trusty lighttpd 
hosts.  When/if this problem recurs we should check the writeability of 
the complaining file before doing anything else.

-A


On 11/27/16 1:05 AM, shinken wrote:
> Notification Type: RECOVERY
>
> Service: ToolLabs Home Page
> Host: ToolLabs
> Address: tools.wmflabs.org
> State: OK
>
> Date/Time: Sun 27 Nov 07:05:01 UTC 2016
>
> Additional Info:
>
> HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.041 second response time





More information about the Labs-admin mailing list