[Labs-l] tools: some continuous jobs not rescheduled after reboot

Merlijn van Deen valhallasw at arctus.nl
Tue Aug 18 15:29:11 UTC 2015


Hello all,

On 17 August 2015 at 22:14, Merlijn van Deen <valhallasw at arctus.nl> wrote:

> We will try to prevent this tomorrow by taking more time to reschedule the
> jobs, but because we are uncertain of the underlying issue, we cannot
> guarantee jobs will not fail after all. In any case, we will follow up with
> a status report.
>

Unfortunately, due to a bug in one of our scripts, we still mass-restarted
jobs :( The following list of jobs fell victim today:

user, task, job_id

tools.wikihistory dewiki_update4 1246273
tools.ralgisbot wes-redir 331287
tools.phetools match_and_split 2337
tools.pbbot plwiktLinkManager 1600897
tools.wikilinkbot linkbotv11 1673095
tools.yifeibot rmiw.w1 1702619
tools.cluestuff recent_referencebot 5589
tools.wmfdbbot dbbot-wm 261
tools.wikihistory dewiki_update5 1246274
tools.hewiki-tools webServ 1915849
tools.cluestuff recent_bracketbot 5588
tools.yifeibot rmiw.w3 1702691
tools.yifeibot rmiw.w4 1702714
tools.phetools extract_text_layer 2338
tools.ralgisbot wes-isbnanx 1738090
tools.giftbot vm 1566
tools.ralgisbot qes-redir 331281
tools.giftbot gva 1567

I had hopes SGE stored enough information to restart jobs after they died,
but unfortunately this does not seem to be the case, so you will have to
restart the job manually.

Merlijn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150818/81cb92b1/attachment.html>


More information about the Labs-l mailing list