Looking for Senior AWS Serverless Architects & Engineers?
Let's TalkContainerizing your application might be as easy as creating a Dockerfile, but there are certainly a few other things we need to consider when we’re using containers to run our applications. One such thing is gracefully shutting down the containers when they are stopped.
Why Is Graceful Shutdown Important?
Besides a failure in the application, a container can be stopped when the container orchestrator (ECS, EKS) decides to do it, and it could do it for many reasons, e.g., as part of a scale-in process, if health checks are failing, when performing a rolling update to a new version of an application, the previous version containers are shut down, etc.
When a program is gracefully stopped, it is given time to save its progress and release resources. As part of that, you might want to do some cleanup, like closing a TCP server, closing the database connection, reporting any metrics, saving any in-memory data to persistent storage, etc.
Let’s take a scenario where a container creates a long-lived TCP connection with a database. As part of a scale-out process, the container orchestrator creates 20 more such containers, and now the database has 20 long-lived TCP connections. After some time, the scale-in process kicks in, and the number of containers is brought down from 20 to 2.
If the application does not close connections to the database server, the database server will quickly detect it when the server tries to send data to the client. But if the database session is idle, the server process is waiting for the client to send the next statement. Then the server won’t immediately notice that the client is no longer there! Such lingering backend processes occupy a process slot and can cause you to exceed 'max_connections' [1]
Termination Signals
Exceeding 'max_connections' is one of the many reasons why we should gracefully handle the shutdown event in our applications. In an event where an application is going to be stopped, it’s sent a signal. A signal (also called a software interrupt) to the application’s process. It is the responsibility of the application to ensure these signals are being handled.
Although any process can send signals to any other process (even to itself) using the 'kill()' syscall, usually, it’s the parent terminal process that sends these signals to the processes spawned by it.
Docker
To demonstrate handling of the signals in the blog, I will use Docker as the container runtime and nodejs to write the application code, but you can use any container runtime or language to do this.
Docker is a container runtime. To dockerize your application, you create a dockerfile and as part of your build process, you build the docker image. Once the image is built, the docker engine or the container orchestrator (like ECS, EKS) can start single or multiple containers using that docker image.
Docker And The PID 1
Before we get into handling interrupts, we need to first understand the significance of the process running inside the docker with the process id 1.
The first process spawned in a running docker container is given the process id 1. The special thing about this process is that this process receives termination signals. If you run your docker application using a shell script, the shell script takes the PID 1, while your program becomes a child process spawned by the parent shell process. The shell process, by default, will not forward any signals to your application.
For node applications, if you use 'npm' to run your application, then, the 'npm' process takes the PID '1', and your app’s process takes some other PID, the npm process also does not forward any signals to your node application. That’s why it is a good practice to start your node application directly with the node binary like 'node app.js'. In this way, your node app takes the PID 1 and, thus, will receive signals.
I usually create my nodejs DOCKERFILEs like this to make sure my node apps run with the PID 1.
Good ✅
Bad ❌
Now that we’ve discussed the importance of docker’s PID 1. Let’s discuss signals and how your application can listen to these signals.
Signals
There are many different types of signals, and you can listen to all of them. Each signal serves a different purpose, and thus handling of the signals could also be slightly different based on the signal type.
You can see the full list here.
SIGTERM
The SIGTERM signal requests a process to stop running. The process is given time to gracefully shut down. SIGINT is very similar to SIGTERM.
SIGKILL
The SIGKILL signal forces the process to stop executing immediately. The program cannot ignore this signal. This process won’t get cleaned up, either. There is no point in listening to this signal as the process is killed before it can do anything.
SIGINT
The SIGINT signal is the same as pressing ctrl-c. On some systems, "delete" + "break" sends the same signal to the process. The process is given time to gracefully shut down.
When the docker engine detects that the container is not healthy, based on the health check configuration, it will either restart the container or stop it and create a new one. In both cases, it will first stop the container using the 'docker stop' command.
'docker stop' command sends 'SIGTERM' signal to the PID 1. The PID 1 is given 30 seconds to shut down, if it does not shut down in 30 seconds, then docker sends a 'SIGKILL' signal which stops the process immediately.
In a unix-like OS, when you use the kill shell command 'kill [PID]' , the 'kill' program sends 'SIGTERM' signal to your app, which gives your app some time to gracefully shut down.
If you want to stop the process immediately, you can use the 'kill -9 [PID]' variant of the kill command, which sends a SIGKILL signal to your app, stopping it instantly.
Listening To Signals
In nodejs, the way to listen to a signal is to add an event handler to the global 'process' object.
In go, you would do the same like this:
In the code above, we are listening to 'SIGINT' and 'SIGTERM' termination signals, and on receiving these events, we are first closing the http server and then closing the database connection as part of the cleanup.
Remember that the above handlers will only work if your process has the PID 1 inside of a docker container.
Conclusion
Even though we have so many tools to automate health checks, scale, and perform version updates of our containerized applications, there are still a few things that we need to manage from within the application. Ensuring that our containers shut down gracefully is one of the most important things we need to handle. Shutting down a container gracefully will make sure that adding/removing containers in your app cluster will not cause any side effects.
Thanks for reading.
Happy Coding 🕊
Sources
1. https://www.cybertec-postgresql.com/en/tcp-keepalive-for-a-better-postgresql-experience/