[Labs-l] Slight change on how jobs are ended

Marc A. Pelletier marc at uberbox.org
Sun Oct 12 02:48:41 UTC 2014


Hello all,

In order to fix a few problems with the way jobs are ended, I have
changed the gridengine settings on how jobs are terminated:

tl;dr: If you don't know what a signal handler is or never use them, you
probably can ignore this email entirely and nothing will visibly change
for you.

Previously, jobs that were to be terminated by gridengine were sent a
SIGKILL to the head process.  This was reliable, but abrupt.

>From now on, terminating jobs will have SIGINT sent to the entire
process group instead (the original job process, and all its children).
 This should clean things up more gently (and possibly more cleanly),
but if you have jobs that ignore or handle SIGINT then they will not be
killable.

This means that your jobs now have an opportunity to clean up after
themselves if they need it - but with that power comes the
responsibility of making sure that the process /does/ exit quickly after
handling the signal.  Delays below 10 seconds or so are reasonable; much
more than that and we will be having words.  :-)

This also means that if you create process groups, those processes won't
properly receive the termination signal.  Make certain that they notice
the parent is gone and exit.

-- Marc



More information about the Labs-l mailing list