[Labs-l] Fail to restart webservices

Merlijn van Deen valhallasw at arctus.nl
Thu Aug 20 17:25:33 UTC 2015


On 18 August 2015 at 10:41, Merlijn van Deen <valhallasw at arctus.nl> wrote:

> On 18 August 2015 at 03:22, Thomas Tanon <thomaspt at hotmail.fr> wrote:
>
>> Is it related to the current current hight load on the tools labs grid?
>>
>
> This was caused by three of the 10 nodes being out of rotation (one
> disabled for the restart today, two had not come back up correctly after
> the earlier reboots). Those two have been restarted, and an extra execution
> node has been added, so we should be ok for now. We'll take more care about
> making sure the hosts come back up after the coming reboots.
>

A post-mortem & a list of actionables for this outage is now available at
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150817-ToolLabs-WebgridOutage

Best,
Merlijn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150820/1f752363/attachment.html>


More information about the Labs-l mailing list