Monitoring Kubernetes Jobs Doesn’t Have to be a Tough Job to Handle

Find out why using Kubernetes job schedule requires continuous visibility into their operations and why making metrics related to Jobs and CronJobs is critically important to your observability.

Kubernetes is most famous for its ability to achieve a desired state automatically. That's how Kubernetes handles things like Deployments: You tell K8s what you want to deploy, and it figures out how best to deploy it. But sometimes, you have a specific task that needs to be automated, whether it's a one time thing, or a recurring one. Achieving that with Kuberntes requires a different set of primitives than describing a continuous desired state - which is what you’re usually doing with Kubernetes deployments.

That's where Kubernetes Jobs and CronJobs come in. Jobs and CronJobs are handy tools for executing operations within Kubernetes clusters that aren't directly related to actual workloads.

The downside, though, is that Jobs and CronJobs can be tricky to monitor. Because most Kubernetes monitoring tools focus on more common Kubernetes objects, like Deployments and StatefulSets, getting visibility into the Kubernetes Jobs Scheduler and the processes it runs can be challenging.

That challenge can be solved, but doing so requires thinking outside the box of conventional Kubernetes monitoring. We explain how by first providing an overview of how Jobs and CronJobs work in Kubernetes, then discuss approaches to monitoring them so that you know if something goes wrong with a critical maintenance task.

Kubernetes Jobs and CronJobs: A brief overview

Jobs and CronJobs both allow Kubernetes admins to schedule specific tasks, then run them automatically. Again, unlike other Kubernetes objects, Jobs and CronJobs don't attempt to align operations with a desired state. Instead, they just complete a particular task, such as executing a command or running a script.

For example, here's the definition for a simple Job that runs a command using Perl:

This simple job just computes π to 2000 places and prints it out. It takes around 10s to complete.

Jobs vs. CronJobs

In case you're wondering why Kubernetes offers both Jobs and CronJobs features, the answer is that Jobs and CronJobs do similar, but different, things:

  • Jobs run tasks to completion by creating one or more Pods. Jobs can also run in parallel using multiple Pods.
  • CronJobs are similar to processes defined in crontab on a Linux system: They're prescheduled tasks that, in most cases, run on a recurring basis.

So, you'd typically create a Job if you need to run a specific operation (like executing a script that cleans up a database), whereas CronJobs are useful for regularly scheduled maintenance tasks (like performing periodic backups).

This example executes a db backup image every midnight

The importance of monitoring Jobs and CronJobs – and why it's hard

Given that Jobs and CronJobs are often used to perform critical administration or maintenance tasks, it's important to have visibility into tasks that you run using these Kubernetes features. You'll want to know if a backup that you scheduled via a CronJob fails, for instance, or if issues with one Job are causing another Job that depends on the first job to take longer than expected.

Unfortunately, achieving this visibility is not particularly easy. Although it's simple enough to define and run Jobs and CronJobs, it's harder to monitor them. The main reason why, as we noted above, is that most Kubernetes monitoring tools aren't designed with Jobs and CronJobs in mind. They cater instead to objects associated with actual workloads, like Deployments and StatefulSets.

This means not only that it's harder to get monitoring data related to Jobs and CronJobs, but also that answering relevant questions about them can be tricky. With objects like Deployments or StatefulSets, you typically want to know things like "do we have the expected number of ready Pods" or "how long does it take for Pods to become ready." Those are different sorts of questions from the ones you'd care about when dealing with Jobs and CronJobs. In the latter context, knowing which tasks are running, whether any have failed and how the failure of one task impacts other tasks is more important.

To put this another way, monitoring Jobs and CronJobs is less about understanding the ongoing state of Pods and their resource utilization. It's more about keeping track of individual operations that take place behind the scenes on a periodic basis.

Approaches to monitoring Jobs and CronJobs

Fortunately, there are a couple of viable approaches to monitoring Jobs and CronJobs.

Using Prometheus

One is to use Prometheus to push metrics about Job and CronJob operations. This strategy lets you keep track not just of simple success/failure outcomes, but also performance and resource utilization.

The downside is that you have to write custom code (like this Python code) to push the metrics. You must also explicitly configure a push gateway location and update it whenever it changes. So, there’s a lot of work in terms of both upfront effort and ongoing maintenance if you want to use Prometheus for monitoring your Jobs and CronJobs.

Using Kube-state-metrics

Alternatively, you can use Kube-state-metrics, a straightforward service that listens to the Kubernetes API server, then generates metrics regarding the state of objects, including Jobs and CronJobs.

This approach lets you pull a variety of useful metrics, such as job start and complete times and job failures.

But here again, you have to customize your monitoring tooling to display and analyze the right metrics. Few existing Kubernetes monitoring or observability platforms are built with Jobs and CronJobs in mind, so you can't simply turn them on and expect to stay on top of the most relevant metrics automatically.

Toward a better future for Jobs monitoring

While tracking Jobs and CronJobs in Kubernetes may not be as simple today as many admins would like, there's reason to hope it will improve going forward as teams make wider use of monitoring tools that are truly Kubernetes-native.

Kubernetes-native monitoring tools, like groundcover, make it possible to collect relevant data about Jobs and CronJobs – as well as any other type of Kubernetes object – using services, like Kube-state-metrics, that are native to Kubernetes. This approach avoids the complex setup and management effort required to pull metrics using custom code. It also typically leads to more efficient data collection, because collecting metrics in a K8s-native way generally consumes fewer resources.

Kubernetes-native monitoring of Jobs and CronJobs is already possible, as we showed above. What's needed in order for organizations to take full advantage of the process, however, is broader recognition of the importance of making metrics related to Jobs and CronJobs first-class citizens within Kubernetes observability. Deployments, StatefulSets and other workload-centric objects are critical to monitor, too, but they're not the only thing that matters within your Kubernetes cluster. If you use Jobs and CronJobs, you need continuous visibility into their operations as well.

November 23, 2022

5 min read

Explore related posts