I have a service controlled by a daemontools family toolset that svstat reports to be in the "up" state, but also reports has been up for 0 seconds, even if I wait a while and query its status again. I also am unable to actually make use of the service. What am I doing wrong to produce this erroneous behaviour ? How can I correct it ?
This is the Frequently Given Answer to that question.
What you are doing wrong is specific to the type of service that you are attempting to run. It's not possible to describe a single error that fits all circumstances, or prescribe a blanket cure. However, the result of your error, whatever it is, is that the service is exiting as soon as it is started, resulting in it being continually respawned by supervise.
You need to find out why the service process is exiting immediately. There are two things that you can do for a start in order to diagnose what the problem is:
Read the log. If the output of your service is logged via a secondary logging service then look at the content of your log files to see what messages your service is sending to its standard output and standard error. These might tell you immediately what the problem is.
(In order to capture any errors in the service's ./run script itself, ensure that it has an
or aexec 2>&1
line before any other commands.)fdmove -c 2 1
Test the script. Try, as the superuser, running the ./run script for your service manually:
If the script fails and exits, determine why it does so and correct as appropriate.( cd $SERVICE_DIRECTORY && ./run )
For example:
Two common errors that cause secondary logging services to exit immediately are:
that you haven't actually yet created the user that you have told setuidgid to switch to; and
that the user as which you have decided multilog should run does not actually have the appropriate directory permissions to change to or create files in the directory in which you have decided the logs should be written.
Another, less common, error is:
that you have removed execute permission from ./run, or accidentally corrupted its first line (adding a carriage return before the linefeed, for example) when hand editing it.
In all three cases, the error is readily apparent when you (attempt to) run the ./run script manually.
If the script succeeds and the service stays running, investigate what is different about your environment, per-process resource limits, and controlling TTY for your interactive shell that means that ./run runs when invoked from your interactive shell; but does not run when invoked as a grandchild of svscan, with whatever environment, per-process resource limits, and (properly, no) controlling TTY that that passes to its child supervise processes and thus indirectly to its grandchildren.