The conceptual problems with microkernels

You've come to this page because you've asked a question similar to the following:

What are the conceptual problems with microkernels?

This is the Frequently Given Answer to such questions.

One of the problems with the microkernel idea in practice is that in many ways it is employed to ignore problems by simply redefining the software to exclude them. One defines "kernel" to be something smaller than before, and considers problems to be gone simply because they no longer occur within that portion of the overall system that one has labelled "the kernel". "The system doesn't crash" because "the code that crashes isn't executing in the kernel".

Anyone with in depth experience of operating systems knows that certain classes of application mode code problems can bring down a whole system. This is as true of microkernel operating systems as it is of a fatal X server bug causing the loss of all X clients, or of the CSRSS Backspace Bug causing Windows NT 4 to fail. The system as a whole is more than the kernel, and this is especially true of microkernel operating systems — ironically making the problem that much more acute on such operating systems.

On microkernel operating systems there are almost invariably one, sometimes several, server processes whose failure will render the entire system inoperable, even though "the kernel" is unaffected in its operation. (The authentication server on the Hurd is one such process, for example.) Whilst a fatal bug in some system code cannot accidentally overwrite the kernel when it resides in an application process, the fact that in a microkernel operating system the kernel does far less means that even though the kernel continues to function, there's not much that can actually be done with the system after the server process has crashed.

It's often stated that message-passing is one of the problems with the microkernel idea in practice. In fact, this isn't actually a problem that is specific to microkernels. Plenty of monolithic operating systems pass messages around, too. Windows NT's entire I/O subsystem, for example, is based upon message passing. (I/O Request Packets are messages.) The STREAMS mechanism in AT&T Unix is a message-passing mechanism, as is the sockets layer in the various BSDs. Most if not all of the criticisms levelled at message-passing in microkernels can be equally levelled at monolithic kernels as well. The processing of "mbufs" in the BSDs has well-known design issues with respect the number of copy operations, for example. Operating systems where GUI programs use the X protocol have performance issues relating to message passing, data marshalling, and context switching between client and server. Message passing is a general design issue in many operating systems, not just in microkernel ones.

What the development of mainstream operating systems over the past decade and a half has shown is that the "microkernel/monolithic kernel" distinction is in many ways a distraction from the real engineering issues to address in operating systems design, which actually apply to both microkernels and monolithic kernels alike. A small selection of these issues, in no particular order, are:

separating mutually untrusting portions of code into separate protection domains, such as separate processes running under the aegises of different user accounts (This applies just as much to applications softwares as it does to system softwares. Consider the design of qmail, for example. Consider how Windows NT version 6 is finally learning from the design of qmail in the area of resources used by service processes. And consider the security ramifications of all of the major servers in the Hurd having ports to each of the others.)
high cohesion and low coupling of modules
interface definitions, including future-proofing APIs (Consider the implications of the GreXXXX() API for display drivers in the move from Windows NT 3 to Windows NT 4, for example.)
choice and tradeoffs between client-server designs where resources are local and hidden and designs that involve autonomous distributed access to accessible global resources (Consider the evolution of the graphics subsystem from Windows NT version 3.5, through version 4, to version 6, for example. And consider why the Hurd has to have a /hurd/magic server.)

© Copyright 2007,2009 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.