[Labs-l] Cron job concurrency: consider adding `-once` to your cron tasks
Maximilian Doerr
maximilian.doerr at gmail.com
Wed Aug 2 17:05:55 UTC 2017
Which tools are the offending tools?
Cyberpower678
English Wikipedia Account Creation Team
English Wikipedia Administrator
Global User Renamer
> On Aug 2, 2017, at 13:04, Bryan Davis <bd808 at wikimedia.org> wrote:
>
> We saw a big spike of active Grid Engine jobs starting around
> 2017-08-01T00:00. I've been looking at the list of active jobs and
> noticed that several tools had a lot of copies of the same job
> running. There are tools that are designed to have several copies of
> the same job running working from a shared queue of some sort, but
> often this is a sign that something is wrong with the script.
>
> Here's fancy shell pipeline that will give you a list of all of your
> tool's running jobs grouped by job name and sorted by start time:
>
> qstat -xml |
> tr '\n' ' ' |
> sed 's#<job_list[^>]*>#\n#g' |
> sed 's#<[^>]*>##g' |
> grep " " |
> column -t |
> awk 'BEGIN { OFS="\t" } {print $1, $3, $6, $5}' |
> sort -n -k 3|sort -s -k 2
>
> You can use this to see if you have parallel jobs running and if so
> when the "stuck" jobs started. It seems that there may have been some
> database related events happening between 2017-07-31T23:00 and
> 2017-08-01T06:00 that left a bunch of jobs stuck in a bad state
> internally.
>
> To keep your cron scheduled jobs from running in parallel, you can add
> the `-once` flag to your crontab. Either `jsub -once ...` or `qcronsub
> ...` will do this for you. When the once flag is active, jsub and
> qcronsub will look for jobs that your tool is already running and if
> there is an active job with the same name then the new job will *not*
> be started and an error message will be logged. The name is either
> provided explicitly with `-N ....` or automatically added based on the
> command if -N is not used.
>
> (This should probably end up on wikitech in the help somewhere...)
>
> Bryan
> --
> Bryan Davis Wikimedia Foundation <bd808 at wikimedia.org>
> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
> irc: bd808 v:415.839.6885 x6855
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20170802/a9d2ecfd/attachment.html>
More information about the Labs-l
mailing list