Mistakes to avoid when designing Unix dæmon programs

You've come here because you have either perpetrated one of the classic design mistakes in your Unix dæmon, or asked a question similar to the following:

What mistakes should I avoid making when designing how my Unix dæmon program operates ?

This is the Frequently Given Answer to that question.

(You can find a different approach to this answer on one of Chris's Scribbles.)

There are many dæmon supervision and management tools that require dæmons to work as explained here, including:

AT&T System V init (not to be confused with System V rc) and its Linux clone
All of the daemontools-family toolkits:
- supervise from Dan Bernstein's daemontools, Bruce Guenter's daemontools-encore, or Adam Sampson's freedt
- service-manager from the nosh package
- runsv from Gerrit Pape's runit
- s6-supervise from Laurent Bercot's s6
- perpd from Wayne Marshall's perp.
the Service Access Controller in Solaris 9
the Service Management Facility in Solaris 10
the System Resource Controller in AIX
launchd in Mac OS 10, which has a chapter in its Daemons and Services Programming Guide with prohibited and discouraged behaviours
init from Scott James Remnant's upstart
Lennart Pöttering's systemd, which has an entire manual page on what the systemd people naïvely call "new-style dæmons"

Don't `fork()` in order to "put the dæmon into the background".

IBM was almost certainly the first to give this advice. The System Resource Controller in AIX pre-dates most of the other dæmon supervisors (albeit that init has always expected dæmons to operate in this fashion). Here's the advice that even the conservative IBM was giving, back in 1995:

“ When the dæmon program is started by init, the SRC, or inetd, the originating process calls the fork routine to create a child process. The code in the child process calls an exec routine to run the dæmon program. The parent process considers the child process that it created to be the dæmon. When the child process terminates, the parent process considers the dæmon to have terminated. Therefore, it would be a mistake for the dæmon program to execute the code in Figure 13 on page 27 if the dæmon was started by init, the SRC, or inetd.”

— Eric M. Agar. Writing Reliable AIX Dæmons, IBM Redbook SG 24-4946-00, IBM Corporation

This whole idea is wrongheaded. The concepts of "foreground" and "background" don't apply to dæmons. They apply to user interfaces. They apply to processes that have controlling terminals. There is a "foreground process group" for controlling terminals, for example. They apply to processes that present graphical user interfaces. The window with the distinctive highlighting is conventionally considered to be "in the foreground", for example. Dæmons don't have controlling terminals and don't present textual/graphical user interfaces, and so the concepts of "foreground" and "background" simply don't apply at all.

When people talk about fork()ing "in order to put the dæmon in the background" they don't actually mean "background" at all. They mean that the dæmon is executed asynchronously by the shell, i.e. the shell does not wait for the dæmon process to terminate before proceeding to the next command. However, Unix shells are perfectly capable of arranging this. No code is required in your dæmon for doing so.

Let the invoker of your dæmon decide whether xe wants it to run synchronously or asynchronously. Don't assume that xe will only ever want your dæmon to run asynchronously. Indeed, on modern systems, administrators want dæmons to run synchronously, as standard behaviour.

dæmon supervisors assume (quite reasonably) that if their child process exits then the dæmon has died and can be restarted. (Conversely, they quite reasonably assume that they can do things like stop the dæmon cleanly by sending their child process a SIGTERM signal.) The old BSD and System 5 init programs do this. So, too, do most proper dæmon supervision toolkits from the past 30 years. Forking to "put the dæmon into the background" entirely defeats such tools; but it is ironic that it does so to no good end, because, even without fork()ing, dæmons invoked by such supervisors are already "in the background". The dæmon supervisors are already running asynchronously from interactive shells, without controlling terminals, and without any interactive shells as their session leader.

If some novice system administrator, running your dæmon from /etc/rc, /etc/rc.local, or similar (rather than from init or under the aegis of one of the many dæmon supervision toolkits) then asks you how to persuade your dæmon to run asynchronously so that the script does not wait for it to finish, point them in the direction of the documentation for the shell's '&' metacharacter.

Don't assume that "foreground" means "debug mode".

Running a dæmon synchronously doesn't necessarily mean that vast quantities of debugging information are required. If (because you haven't bitten the bullet and eliminated the totally unnecessary fork()ing from your code) you have a command option that switches your dæmon between running "in the foreground" and running "in the background", do not make that command option do double duty. Running the dæmon without fork()ing should have no bearing upon whether or not debugging output from your program is enabled or disabled.

In general, don't conflate options that affect the actual operation of your program with options that affect the output of log or debug information.

Don't use `syslog()`.

syslog() is a very poor logging mechanism. Don't use it. Amongst its many faults and disadvantages are:

All of the streams of log information from different dæmons are all mixed together, usually only to be separated out again into multiple separate log files. This fan-in/fan-out wastes a lot of time and effort for no good reason.
It requires the addition of extra files, the names of which vary from Unix to Unix, to a chroot() jail.
Because it uses UDP for its remote-logging transport, between the caller of syslog() and the log server, it is unreliable. Log information won't necessarily be recorded in order, and may not be recorded at all if there is so much traffic that packets are dropped.
Because it uses UDP for its remote-logging transport and world-writable AF_LOCAL sockets for its local transport, log messages can be trivially spoofed (with the logger command, even).

Write your log output to standard error, just like all other Unix programs do.

The logging output from your dæmon will as a consequence automatically be entirely separate from the logging output of other dæmons.
You won't need any extra files in the chroot() jail at all.
Log data will not be lost and will be recorded in the same order that they were generated.
Other programs/people will not be able to insert spoof messages into the output.

You'll find your dæmon easier to write, to boot. Code using fprintf(stderr,…) (or std::clog) is generally easier to maintain than code using syslog().

In most dæmon supervision toolkits, there is a facility for the dæmon supervisor process to open a pipe, attach its write end to your dæmon's standard error, and attach the read end to the standard input of some other "log" dæmon. The dæmon supervisor in most toolkits also keeps the pipe open in its own process, so that if the "log" dæmon crashes and is auto-restarted (or is restarted by administrator command for some reason), unread log data at the time of the crash/restart remain safely in the pipe ready to be processed.

If a system administrator isn't using such a supervision toolkit, xe can always send your dæmons' standard errors through a pipe to splogger, logger, or sissylog. But the converse isn't true; syslog() isn't composable with other logging mechanisms if one doesn't have a specialized dæmon listening on the protocol-specific sockets. An administrator can deal with standard error if xe isn't using a toolkit that already does, but dealing with syslog() similarly is a lot harder.

systemd connects all dæmons' standard errors to the systemd "journal" dæmon through a pipe, too. Although it retains other parts of the syslog design, such as combining multiple log streams into one giant single stream.

This is not directly relevant to how a dæ:mon operates per se, but there is a known problem with combining multiple streams into one: A flood of log messages from one highly verbose (or indeed malicious) source can cause log file rotation and the loss of potentially valuable log information from another important but relatively quiet source. This is why the syslog mechanism has fan-out after the fan-in. On most systems as configured out of the box it is still relatively easy to wash important logs away in a flood, though, as there isn't very much fan-out.

By avoiding the fan-in in the first place, one avoids this problem more neatly. The various daemontools-family toolsets allow "main" dæmons to have individual "log" dæmons connected via individual pipes; thereby allowing for completely disjoint streams of data, with individual disjoint log rotation and size policies per dæmon if desired, and (for maximum security) every "log" dæmon run under the aegis of its own individual user account, that uses the operating system's own account permissions mechanisms to protect the "log" dæmon process and its log files/directories from interference by users, other "log" dæmons, and even the "main" dæmon whose output is being logged.

Don't deal with TCP/IP directly.

Let programs such as inetd, tcp-socket-listen and tcp-socket-accept from the nosh package, tcpserver (from Dan Bernstein's UCSPI-TCP), sslserver (from UCSPI-SSL), or tcpsvd (from Gerrit Pape's ipsvd) deal with the nitty gritty of opening, listening on, and accepting connections on sockets. All that your program needs to do is read from its standard input and write to its standard output. Then if someone comes along wanting to connect your program to some other form of stream, xe can do so easily.

Make your program into an application that is suitable for being spawned from a UCSPI server. If you really do need to have access to TCP-specific information, such as socket addresses, don't call getpeername() and so forth directly. Parse the TCP/IP local and remote information that should be provided by the UCSPI server in the TCP environment variables. (For one thing, a system administrator will find it a lot easier to test an access-control mechanism that is based upon $TCPREMOTEIP than to test one that is based upon getpeername().)

This design allows you to more closely follow the Principle of Least Privilege, too. If your program were to handle the TCP sockets itself and the number of the port that it used was in the reserved range (1023 and below), it would need to be run as the superuser. All of the code of your program would need to be audited to ensure that it had no loopholes through which one could gain superuser access. If, however, your program relied on tcpserver, sslserver, or tcpsvd to perform all of the socket control, it could be invoked under a non-superuser UID and GID via setuidgid. Loopholes in your program would only allow an attacker to do things that that UID and GID could do. If the UID and GID did not own and had no write access to any files or directories on the filesystem, for example, and no other processes ran under the same UID, then an attacker who compromised your program could do very little (apart from disrupt the operation of your program itself).

If your program is one of the (exceedingly) rare cases where you do need to create, listen on, and accept connections on sockets yourself, allow the system administrator full control over the IP addresses (and port numbers) that your program will use.

Don't create PID files in `/run` (or anywhere else).

Creating PID files in /run (a.k.a. /var/run) has all sorts of flaws and disadvantages, among them:

It is prone to race conditions.
It is unreliable. A dæmon that exits without cleaning up its PID file is leaving a loaded gun around that the superuser might accidentally use to shoot entirely innocent processes some time later. (Consider what might happen if the superuser invokes a script that shuts down dæmons by running kill `head -1 /var/run/pidfile`.)
It makes having multiple concurrent instances of a single dæmon more complicated.
There is no implicit automatic cleanup of stale information at system startup.

Let the system administrator use whatever dæmon supervisor is invoking your dæmon to handle killing the correct process. dæmon supervisors don't need PID files. They know what process IDs to use because they remember it from when they fork()ed the dæmon process in the first place. init doesn't need a PID file in /var/run to tell it which of its children to kill when the run level changes, for example.

daemontools has been described as "/var/run done the right way". The other tools in the daemontools family toolkits use the same approach. With all of them, there is no need for PID files. dæmons are controlled with the svc (or runsvctrl) commands, which are what shutdown scripts should use instead of kill, pkill, and the like.

© Copyright 2001–2004,2007,2014 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.

Mistakes to avoid when designing Unix dæmon programs

Don't fork() in order to "put the dæmon into the background".

Don't assume that "foreground" means "debug mode".

Don't use syslog().

Don't deal with TCP/IP directly.

Don't create PID files in /run (or anywhere else).

Don't `fork()` in order to "put the dæmon into the background".

Don't use `syslog()`.

Don't create PID files in `/run` (or anywhere else).