[Labs-l] queue down (for real)

Andrew Bogott abogott at wikimedia.org
Wed Dec 30 03:06:19 UTC 2015


On 12/29/15 8:22 PM, Andrew Bogott wrote:
> On 12/29/15 7:01 PM, Bryan White wrote:
>> nothing of mine has run on the queue for ~90 minutes.
>>
>> Output of 'qstat -f'
>> error: commlib error: got select error (Connection refused)
>> error: unable to send message to qmaster using port 6444 on host 
>> "tools-grid-master.tools.eqiad.wmflabs": got send error
>>
> 12000 or so jobs were scheduled over the course of about 90 minutes 
> and the grid is overwhelmed -- we're working on untangling the mess.
>
Oops, my mistake, that was 12000000 jobs.

>
>> Bryan
>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20151229/2de8f30b/attachment.html>


More information about the Labs-l mailing list