a horror story in the systemd house of horror

Wrapping Apache Tomcat in many pointless extra layers

The cast

Tomcat comes with programs named startup.sh and shutdown.sh in its bin/ directory. Those must be what one uses to start and stop Tomcat when running it as a service under systemd, right? Baptiste Wicht, Michael Adams, Zachary Grafton, Zolan Thapa, and the unnamed person behind ServerWorld all seem to think so:

ExecStart=/usr/share/tomcat/bin/startup.sh
ExecStop=/usr/share/tomcat/bin/shutdown.sh

Matt Quinn goes one further and writes a systemd-tomcat script that wraps these two, so that one can call them with start and stop verbs just like a System 5 rc script. He bases this on an anonymous answer on a Q&A WWW site where someone else did this for iptables, and he extends this, logically, to Tomcat too.

ExecStart=/bin/sh systemd-tomcat start

Tomcat also comes with a program named daemon.sh in its bin/ directory. Geir Arne Ruud thinks that one should use that under systemd:

ExecStart=/home/tomcat/tomcat/bin/daemon.sh start
ExecStop=/home/tomcat/tomcat/bin/daemon.sh stop

All of these are wrong.

The horror story

The big clue is the seemingly erroneous ("I guess the systemd doco must be wrong.") hoops that they've variously had to jump through to get things to work this way. Two unit files mark the service as a oneshot service; which doesn't apply to this program which clearly keeps running after being started, unlike a one-shot type service. So because even that bodge didn't work, they've marked it as RemainAfterExit=true as well.

The simple truth, as pointed out by Adam Young in 2013 is that Tomcat is a Java program. To run it, you set a few environment variables and execute /usr/bin/java to run the org.apache.catalina.startup.Bootstrap class, with the start command-line option and some Java properties that are (it transpires) just old-style ways of passing environment variables to the program. "So," one might ask, "what are the startup.sh, shutdown.sh, daemon.sh programs all about, then?" The answer is that they are just wrappers, and not only are they pretty much entirely superfluous, in some cases they are actively making the system more complex and rickety.

If one looks at startup.sh and shutdown.sh you'll see that they both just exec into a third shell script, catalina.sh, passing it either start or stop as (an additional, prefixed) command-line argument #1. This is where M. Quinn's approach begins being completely superfluous. He has a common script of his own that takes start and stop verbs, which calls startup.sh and shutdown.sh, which both devolve back to another common script that takes start and stop verbs just like the first one did. He would be better off, just to begin with, calling the common catalina.sh script directly and bypassing the entirely useless fan-out/fan-back-in little dance.

It gets worse, though. Because he has the extra little script, and because his little script does not use exec, the process that is running catalina.sh is not the main process that systemd started. It is a child of that process. We'll return to the problem that this causes, shortly. But in the meantime we've reached the point where M. Wicht, M. Adams, M. Grafton, et al. join the fray, as they too are invoking startup.sh and shutdown.sh.

"So," one might now ask, "one just cuts out these two middle-men and runs catalina.sh directly, then?" The answer, however, is that one does not. If one looks at catalina.sh one might be excused for not seeing the forest for the trees. There's a lot of stuff there that cannot possibly apply on a systemd operating system, because systemd doesn't run on OS/400 or Cygwin or MacOS. There's also a lot of stuff there that cannot possibly apply since we're only ever coming in here, remember, with either the start or the stop action. Strip those away, and what catalina.sh does is this:

So it turns out that catalina.sh is a middle-man, too. It is nothing but a Poor Man's Dæmon Supervisor written (badly, as they always are) in shell script. But we already have a better dæmon supervisor. It's systemd, the thing that we are trying to run Tomcat under in the first place. So there's no need for any of this. There's certainly no need for the PID file nonsense; that's exactly the sort of thing that is badly written in shell script, and it is a dangerous, rickety, and unreliable mechanism. Proper service managers have no need of it. They just remember the PID of the child that they themselves forked.

Which brings us back to M. Quinn's systemd-tomcat script. Remember that it spawns the Tomcat-bundled scripts as child processes, rather than execing them. The problem here is that there are three models that a systemd-managed dæmon can follow, and this doesn't match any of them.

Stripping out the fan-out/fan-back-in little dance and calling catalina.sh directly would at least match the Type=forking model. But there's still that rickety, dangerous, and completely unnecessary PID file nonsense in there. We can, in fact do better, and have a Type=simple service, with no PID files at all, since the underlying Java program has no dealings in them.

And let us mention another horror in catalina.sh: the logging. The dæmon process has its standard output and standard error redirected to a log file. This is a known-bad way to go about logging. One cannot size-cap the log file; one cannot rotate the log file; one cannot shrink the log file. It grows, as one file, forever until the dæmon is shut down. Again, this is ridiculous when we have systemd. systemd can send standard output and standard error to multiple log files that — whatever else one may think of them — are most definitely size capped and rotatable.

No, we haven't forgotten M. Ruud and daemon.sh. That's actually a similar case to another horror story. M. Ruud is following where the Apache doco suggests the use of jsvc for being a dæmon. But, as one learns, the "official doco" isn't always right and up to date.

Doing things right strips all of this away. The only necessity is what M. Ruud and M. Grafton already had done: a file, let us call it /etc/default/tomcat, with all of the site-local environment variable settings explaining such things as where the Java Runtime Environment is this week.

CATALINA_HOME=/usr/share/tomcat
CATALINA_BASE=/usr/share/tomcat
CATALINA_TMPDIR=/var/tmp/tomcat
JAVA_HOME=/usr/share/java/jre-x.y.z

Oh look! Madhuranga Lakjeewa starting Tomcat with Apache Ant doesn't go through all of those nutty layers of pointless shell script, either, and just starts up the class directly with some JVM arguments.

systemd can read environment variables from file and run a Java program. If invoked with start that program becomes the dæmon. systemd will track its PID properly, no rickety PID file nonsense required. systemd will arrange for its standard output and standard error to be logged. systemd will ensure that only one dæmon is ever started at any time.

[Unit]
Description=Apache Tomcat Web Application Container
 
[Service]
User=tomcat
Group=tomcat
EnvironmentFile=-/etc/default/tomcat
ExecStart=/usr/bin/env ${JAVA_HOME}/bin/java \
$JAVA_OPTS $CATALINA_OPTS \
-classpath ${CLASSPATH} \
-Dcatalina.base=${CATALINA_BASE} \
-Dcatalina.home=${CATALINA_HOME} \
-Djava.endorsed.dirs=${JAVA_ENDORSED_DIRS} \
-Djava.io.tmpdir=${CATALINA_TMPDIR} \
-Djava.util.logging.config.file=${CATALINA_BASE}/conf/logging.properties \
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
org.apache.catalina.startup.Bootstrap \
start
ExecStop=/usr/bin/env ${JAVA_HOME}/bin/java \
$JAVA_OPTS \
-classpath ${CLASSPATH} \
-Dcatalina.base=${CATALINA_BASE} \
-Dcatalina.home=${CATALINA_HOME} \
-Djava.endorsed.dirs=${JAVA_ENDORSED_DIRS} \
-Djava.io.tmpdir=${CATALINA_TMPDIR} \
-Djava.util.logging.config.file=${CATALINA_BASE}/conf/logging.properties \
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
org.apache.catalina.startup.Bootstrap \
stop
 
[Install]
WantedBy=multi-user.target

That's 32 lines of service unit replacing almost 800 lines of wrapper shell scripts upon wrapper shell scripts. And even then it's only 32 lines because it has been split for readability on this page.

Of course, this invokes Java by running /usr/bin/java rather than executing the target program file directly. systemd insists that the name of the dæmon, in the systemd journal, is thus "java". Running it indirectly via /usr/bin/env, in order to cope with the need to use ${JAVA_HOME} to find where Java is installed this week, causes systemd to think that the dæmon is named "env". Josh Smeaton knows how to rectify this unsatisfactory state of affairs.

Maybe, in the future, the people who write Tomcat will notice what M. Young pointed out in 2013, and adjust Tomcat so that it just reads the damn environment variables directly itself given that Java programs can. Maybe, in the future, people will decide to have a way to invoke Java that's always in the same place. In such a future, things become a lot more streamlined.

[Unit]
Description=Apache Tomcat Web Application Container
 
[Service]
User=tomcat
Group=tomcat
EnvironmentFile=-/etc/default/tomcat
ExecStart=/usr/bin/java $JAVA_OPTS $CATALINA_OPTS org.apache.catalina.startup.Bootstrap start
ExecStop=/usr/bin/java $JAVA_OPTS org.apache.catalina.startup.Bootstrap stop
 
[Install]
WantedBy=multi-user.target

Bonus track!

You might have seen Dave Benjamin's notes on how to run Apache Tomcat from daemontools. That is actually a clue as to how to run Tomcat properly under systemd. Notice that M. Benjamin invokes catalina.sh with run rather than with start. There is actually another path through catalina.sh, for systems that aim to run dæmons, not to spawn them. Unlike the start/stop paths this uses exec so that the process running catalina.sh run becomes the dæmon program, directly.

This code path is clearly tailored to the way that dæmons are expected to behave by the daemontools family of toolsets. But one could employ it for systemd. It doesn't do PID file nonsense or its own messed-up logging, because of course the daemontools way has been for dæmons not to do those, since 1997. Notice how M. Benjamin tied this into an ordinary log service with multilog.

A systemd unit file, reaping the benefits of what daemontools has long-since required and using the run path, would be:

[Unit]
Description=Apache Tomcat Web Application Container
 
[Service]
User=tomcat
Group=tomcat
ExecStart=/usr/share/tomcat/bin/catalina.sh run
 
[Install]
WantedBy=multi-user.target

Just for kicks, though, let's look at the reverse and see what happens when the service unit from the previous section (sans ExecStop because we are using the run model not the spawn model; and using a variable expansion in the name of the program to run, which systemd itself does not support) is run through a conversion tool to make a daemontools-style run script:

#!/bin/nosh
#Run file generated from ./tomcat.service
#Apache Tomcat Web Application Container
move-to-control-group ../tomcat.service
read-conf --oknofile /etc/default/tomcat
setuidgid --primary-group tomcat --supplementary -- tomcat
sh -c 'exec ${JAVA_HOME}/bin/java $JAVA_OPTS $CATALINA_OPTS org.apache.catalina.startup.Bootstrap start'

© Copyright 2015,2019 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.