If applications were dogs living in yards secured by invisible fences, exit code 139 in Kubernetes would be the equivalent of a dog getting zapped by his electric collar when he tries to step into a neighbor's yard. It's not a pleasant event, neither for dogs nor for Kubernetes applications, but it's necessary to prevent potentially bigger issues – such as massive dog fights or programs trying to overwrite each other's memory space and causing a server to crash.
Now, we're not here to take a stance on whether invisible fences and electric collars are the best way to manage your pets. But what we can do is give you advice on how to deal with code 139 events in Kubernetes – which you need to do if you want to keep your applications running smoothly as part of an effective Kubernetes troubleshooting strategy.
What is Exit Code 139?
In Kubernetes, exit code 139 is an event that occurs when a container receives the SIGSEGV signal from the operating system on its host node.
In Linux and other Unix-like operating systems, SIGSEGV is a type of forced termination signal that tells a process to shut down. The operating system typically generates this signal when it detects a process that is trying to access system memory that either doesn't exist or that the process lacks permission to access – an event known as a segmentation fault (or just segfault, as die-hard Linux geeks like to put it).
If a container receives SIGSEGV, it will usually terminate. That's undesirable, of course, because you typically don't want your containers to shut down unless you decide to shut them down. But the alternative to SIGSEGV is potentially having your entire server crash due to multiple processes trying to use the same memory address – which would be like all of the dogs in the neighborhood rushing into the same yard to brawl. It would be chaos, and everything would stop working because no container would be able to access memory reliably.
So, the operating system sends SIGSEGV error in an effort to prevent a much bigger problem.
How to Identify Exit Code 139
You can detect whether a container stopped due to exit code 139 by running the command docker ps -a. The output will look like this:
As you can see, the STATUS column tells us the container exited with code 139.
Exit Code 139 vs. Exit Code 143
Exit events involving code 139 are similar to exit code 143 errors in that both types of events typically cause a container to shut down. However, the underlying reason for the shutdown is different.
With exit code 143, your container shuts down because it receives the SIGTERM signal. SIGTERM is a signal the operating system can use to request that a process shut down (although unlike SIGSEGV error, it doesn't force the process to shut down). SIGTERM events may take place because a container is moving to a different node, or because a node is running out of resources and a low-priority container needs to shut down so that other containers won't crash.
This means that, on the whole, code 139 may be the sign of a worse problem, like issues with your application code, than exit code 143. In many cases, exit code 143 doesn't result from a problem at all; it's an event that occurs normally during the process of scheduling and rescheduling containers. But memory access issues don't occur normally, so code 139 is especially important to troubleshoot.
Common Causes of Exit Code 139
Although the underlying cause of exit code 139 – a segmentation error – is always the same, there are several specific conditions that could trigger this type of event. Here's a look at the most common.
Error code 139 causes and fixes
Applications often depend on libraries, which are collections of prewritten code that applications "borrow" when they execute.
If the application in a container is configured to use a library that is incompatible with it, you might experience segmentation fault issues and see code 139 because the library the application is using doesn't manage memory in the same way as the application. This type of conflict happens most often if you update an application but forget to update the version of the library associated with it.
A programming error that affects application source code is another common cause of exit code 139. If your application code includes instructions to write to memory that the application can't access, the operating system will react to the event with a SIGSEGV signal.
Coding errors are especially common for applications written in languages that lack built-in memory protection, like C. These languages don't have a native mechanism for preventing programs from trying to access memory that they are not supposed to access; instead, it's up to developers to ensure that they don't make that kind of error when writing code.
Incompatibility between memory subsystems and the applications you are running could trigger code 139. This type of issue is rare when you're running applications on modern, x86-based servers, but you may encounter it if you’re using specialized hardware to host your applications, or if your physical memory is faulty.
Typically, when hardware issues are the root cause of code 139, you'll experience the error across multiple libraries. So, if you’re seeing repeated segmentation error events, your hardware may be the culprit.
How to Fix Exit Code 139
To fix code 139 exits, you must first determine what the root cause of the error is. You can follow these steps to work through the various possibilities and gain more information.
Step 1: Check the kernel logs
Segmentation violation events are typically recorded in the /var/log/messages file of the operating system that initiates the events. Thus, when you see code 139, log into the node that was hosting the failed container and check its /var/log/messages file or the equivalent. (Note that on some Linux distributions, including modern versions of Ubuntu, this file is located at /var/log/syslog, not /var/log/messages. Note also that viewing this file may require root access.)
You should see information about an event that corresponds to the segmentation violation, such as:
Unfortunately, this information will rarely tell you exactly why the segmentation violation happened, but it does specify the process associated with it. In addition, the log may include data about other segfault events that could provide clues to help you troubleshoot; for example, if many processes are experiencing segmentation fault issues, you may have a hardware incompatibility problem.
Step 2: Debug the Application
If only one specific container experiences segmentation fault issues, you most likely have a problem with the code in the container's image. In that case, you can take steps to debug the application and confirm that buggy code is the issue.
Details on how to debug applications for segfault issues are beyond the scope of this article, but suffice it to say that in general, you'd load the application's binary file into a debugging tool (like GNU Debugger), then perform a backtrace. The backtrace shows information about what happens within the application leading up to the segmentation error. If you can identify a specific function call linked to the segfault, you can then use that insight to figure out which part of your application code you need to fix to prevent recurring code 139 events.
Step 3: Inspect Memory
Examining the way your system is using memory may also provide clues about exit code 139 causes and what is triggering a memory violation. On Linux, you can run:
To view information about how the system is using memory in general. You can also run top to monitor memory usage by individual processes.
Also consider using a tool to run a check of your physical memory hardware in order to rule out hardware failure as a cause of strange memory violation behavior.
Step 4: Add Code to Handle Segfaults Gracefully
If you still can't figure out why your container keeps exiting with error code 139, it may be possible to modify your application such that it will recover from SIGSEGV signals gracefully, without crashing.
Here again, this is a complex topic that requires a lot of knowledge about the programming language you are working with. But as an example, we'll point to this great code from Morten Piibeleht, which shows how a C program could handle some types of SIGSEGV signals (specifically, those triggered by an invalid pointer) gracefully:
If your application is written in C, and you add this code to it and then you rebuild the application, it should be able to handle segfault events gracefully instead of crashing.
This isn't a fix for exit code 139 as much as it's a Band-Aid. But if you've tried everything else and you just want your applications not to crash, this approach is worth a try.
To sum up, exit code 139 happens when a container receives the SIGSEGV signal, which instructs it to shut down due to a memory violation issue. The signal exists to prevent a process from destabilizing an entire server.
The specific causes of exit code 139 can be difficult to track down because the operating system doesn't usually produce much information about why a SIGSEGV signal was sent. But you can troubleshoot effectively by looking at operating system logs to determine whether this type of problem is associated with just one container or process (in which case it's likely triggered by buggy code), or is happening across the server (which means you more likely have a hardware issue).
If you do suspect buggy code, debugging the application may help you to pinpoint where you need to update your code. And if all else fails, it may be possible to build logic into your app that lets it handle SIGSEGV events gracefully.