[Labs-l] queue down (for real)

Andrew Bogott abogott at wikimedia.org
Wed Dec 30 02:22:02 UTC 2015


On 12/29/15 7:01 PM, Bryan White wrote:
> nothing of mine has run on the queue for ~90 minutes.
>
> Output of 'qstat -f'
> error: commlib error: got select error (Connection refused)
> error: unable to send message to qmaster using port 6444 on host 
> "tools-grid-master.tools.eqiad.wmflabs": got send error
>
12000 or so jobs were scheduled over the course of about 90 minutes and 
the grid is overwhelmed -- we're working on untangling the mess.


> Bryan
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20151229/e2006676/attachment.html>


More information about the Labs-l mailing list