The known problems with System 5 rc

You've come to this page as a result of a question similar to the following:

What are the known problems with System 5 rc? I've not heard of any until now.

This is the Frequently Given Answer to that question.

You're almost three decades late.

The problems with System 5 rc were remarked upon in its first decade, not least from the BSD world which resisted adopting the idea. Many replacements for it have come along in the decades since, and one can see from their designs the perceived flaws that they attempted to tackle. IBM AIX had the System Resource Controller (SRC) in 1990 with AIX version 3.1, which ran under System 5 init but replaced System 5 rc (Don't get the twain mixed up.) with a service manager program (srcmstr) that handled starting/stopping/supervising system dæmons. Daniel J. Bernstein released daemontools in 1997, which as the name suggests was a toolset for dæmon supervision that boiled the actual dæmons down to very simple run programs with little (and sometimes no) shell programming code in them at all. The Linux file-rc, which replaced System 5 rc and its system of numbered symbolic links in numbered subdirectories with a single database file named runlevel.conf that one (carefully) edited in the same manner as (say) /etc/passwd, came along the same year.

In 1999, Luke Mewburn worked on replacing the /etc/rc system in NetBSD. netbsd.tech.userlevel mailing list discussions from the time show several criticisms of the System 5 rc and System 5 init systems, and encouragement not to repeat their mistakes in the BSD world. The resultant rc.d system was roughly contemporary with Daniel Robbins producing OpenRC, another System 5 rc replacement that replaced the (Bourne/Bourne Again) shell with a different script interpreter, nowadays named /sbin/openrc, that provided a whole lot of standard service management functionality as pre-supplied functions. The NetBSD rc.d system likewise reduced rc.d scripts to a few variable assignments and function calls (in about two thirds of cases).

M. Mewburn was one of a few people who actually wrote papers on the subject explaining the problems of existing systems that he was attempting to address. Others were the designers of Solaris' Service Management Facility (SMF), Richard Gooch, and (to a far lesser extent because they left as an exercise to the reader figuring out why the use cases that they laid out were actually problems for System 5 rc) the authors of upstart. These are highly recommended reading.

Here are some of the salient points.

The mapping between services and init.d scripts is not 1:1.

The old joke "fork Fedora" WWW page made an apples-to-oranges comparison between a systemd service unit and a System 5 init.d script. The problem with the comparison exemplifies the point at hand, here. The init.d script actually starts and stops two Sendmail services, not one.

This is one of the problems with System 5 rc: The individual init.d scripts do not necessarily have a 1:1 correspondence with the services being managed. This has various ramifications:

There is excessive coupling between services that notionally could operate in parallel.

Development of tools such as startpar attempted to address this within the Linux System 5 rc clone. Many of the other systems addressed it as well. The original subsystem proceeded serially, executing init.d scripts one after another. This fails to take advantage of one of the core features of a multi-process multi-tasking operating system.

There are no dependency and ordering capabilities.

Again, this is something that tools such as insserv attempted to address within the Linux System 5 rc clone, calculating a statically configured startup order whenever a service was added or removed. The NetBSD rc.d subsystem likewise has a tool named rcorder which recalculates a total script execution order at each bootstrap.

The original subsystem had no notion of one service requiring the operation of another, and thus depending from it being started first and stopped last. It had no notion of the start/stop of one service implying the start/stop of other necessary/conflicting services.

Services had ordinal numbers for their start and kill scripts, but as Debian found out this did not scale at all. There was no overall system for assigning such numbers; they were just picked largely at whim by whoever it was that wrote the rc script. With a large number of independently developed softwares, the problems of manually assigning an ordering, and then changing it later, became intractable.

There is excessive coupling between scripts.

There is a non-trivial set of init.d scripts (and to a lesser extent rc.d scripts on the NetBSD/FreeBSD systems) which are either used by other scripts internally, or designed to be used by the system administrator as Swiss Army Knives that do things other than actual service management. The former internal interfaces make things opaque and fragile. The latter tend to complicate the scripts and hide the actual service administration parts.

Networking and peripheral device management scripts are some of the sinners in this regard. But there are all sorts of cases in odd corners to be discovered. As noted, this problem is not limited to solely the System 5 rc world; witness things such as such as the FreeBSD remote filesystem mounting calling into the "cleanvar" service, and FreeBSD's power_profile and serial scripts, neither of which actually do any service mangement and implement no bootstrap start/stop mechanisms at all.

The system relies upon the dæmonization fallacy for manual dæmon start/stop.

There is a fallacy that holds that manual service control is clean because one can safely, cleanly, and securely "dæmonize" from an interactive login session; one cannot, it isn't, and system administrators tell war stories about the results. The System 5 rc mechanism for manually starting and stopping services on a running system is mis-designed based upon this fallacy.

Dæmons are not supervised.

The System 5 rc system is an entirely passive system. Pace the considerations of the dæmonization fallacy, where an interactive login session could have set up an alternative child process "subreaper" (creating yet another way in which spawning services from such a session differs unexpectedly from spawning them at bootstrap), running services have process #1 as their parent process. Process #1 has no knowledge of what the individual dæmon processes are. Nor would a subreaper have. Nothing that explicitly knows about the services is informed when they die from crashes, or just exit.

The IBM AIX SRC not only monitored the state of service processes, but had kernel extensions peculiar to AIX that enabled the srcmstr process to recognize services that were not its own children but that had been started by an earlier incarnation of srcmstr. daemontools famously brought into the hobbyist mainstream the idea that dæmons could be auto-restarted after they had crashed/exited. This idea was extended by other members of the daemontools family, with toolsets like nosh, runit, and perp adding "restart" configuration mechanisms for fine-grained control of if, when, and how often services get restarted when they crash or exit.

There is no standard.

POSIX famously avoided standardizing anything to do with superuser-level system administration, system startup, and system shutdown. The System V Interface Definition also does not cover either init (the program for process #1, not the utility command for communicating with that program) or System 5 rc.

It was in 1998 that it first came to light that this lack of any reference standard had grown a mess. Roland Rosenfeld and Martin Schulze, who had derived file-rc from a system named r2d2 written by Winfried Trümper, together with Miquel van Smoorenburg who wrote the Linux System 5 rc clone, discovered that r2d2, file-rc, and the System 5 rc clone all had subtly different ideas about when and on what scripts to run stop and start actions and there was a counterintuitive swap of "stop" and "start" specific to two runlevels. The only reference doco that anyone could point to was the Debian Policy Manual, which of course had been written based upon the behaviour of the Linux System 5 clone in the first place.

Another effect of the lack of standardization is that there was no agreement as to what the run-levels, that controlled which set of scripts the System 5 rc system executed, actually were. In some Linux operating systems, there were distinct run levels for single-user mode, plain multi-user mode, "server" multi-user mode, and multi-user mode with everything (servers and graphical UI). In other Linux operating systems, one or other of "server" multi-user mode and plain multi-user mode were absent. In yet others, and in some of the proprietary System 5 based Unices, yet further variations existed; such as different run levels for different graphical UIs and variations on single-user mode. IBM AIX replaced System 5 rc but generalized System 5 init, going so far as to provide 6 extra run levels (7 to 9 and a to c in AIX 7.1). At the same time, though, it gave only one run level a defined cross-system meaning: level 2. Of course, there was also disagreement amongst all of these as to which mode was assigned which number or letter. The upshot was that system administrators had no portable idea of what (for example) the init 2 command would cause in terms of System 5 rc actions.

A cornucopia of bugs

Most of the aforementioned problems are design problems; now to the implementation problems. In theory, System 5 rc scripts are "just shell", and easy to write and to maintain. The evidence of nigh on three decades' worth of history shows that they're actually quite difficult to implement right.

System 5 rc scripts have a specific interface to obey. But it has several subtleties, which authors are often unaware of or get wrong even if they are aware of them:

System 5 rc scripts have a specific job to do when managing dæmons; including ensuring that the right processes are killed, ensuring that at most one instance of a service runs at any time, and writing little coloured "[OK]" messages to a terminal. But it's one that's hard to write and to do in shell script.

Further reading


© Copyright 2015 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.