K8s logging at scale: from kubectl logs to the PLG stack

Find out how viewing log output in real time using ad-hoc tools like the kubectl logs –tail command gives you an easy way to see what's going on in your system during the development process, and what you need to consider when choosing a solution for real production-scale kubernetes environments.

Regardless of how long you've been working with Kubernetes-based systems, there's one problem determination technique that we can guarantee is in your tool kit: log messages.

During the development process, viewing log output in real time using ad-hoc tools like the kubectl logs –tail command gives you an easy way to see what's going on in your system.  

When you deploy your container into a production environment, however, maintaining that level of observability can become a burden as you try to manage and analyze large volumes of messages from multiple containerized applications executing in multiple pods. In modern cloud-native environments. one must collect, understand, and investigate millions of logs from different sources to understand what’s happening at an application’s runtime.

Legacy logging solutions simply can’t keep up with the complex, distributed infrastructure nature of modern product environments, making finding a simple and performant solution to help you manage this complexity a key to your ongoing Kubernetes logging management efforts. In - Loki by Grafana.

Enter Grafana Loki 

You almost certainly have heard of Grafana, the company that has made its mark with open-source software that enables easy visualization of data from many different sources. In the Kubernetes world, Grafana may be best known for the metrics visualization component of the Prometheus-based cluster metrics solution.  

But things are changing and more recently, Grafana has been evolving into a full-blown observability vendor in its own right, with new projects such as Loki, Mimir, and Tempo addressing the key observability requirements for logging, tracing and metrics.

The Loki project in particular is squarely focused on the challenge of managing distributed, high-volume, high-velocity log data with a cloud-native architecture inspired by Prometheus (and in fact Loki touts itself as "like Prometheus, but for logs").  

Loki is equipped with many  advantages making it a great fit for the challenges of modern environments.

It's simple to set up and easy to operate, It only indexes metadata instead of the full log messages making it light-weight, it works well together with other cloud-native tools such as Kubernetes and it uses common object storage solutions like Amazon S3.

Available as either a self-managed open source version or a fully-managed service provided by Grafana Cloud, Loki forms the foundation of what is known as the "PLG" stack: Promtail for log stream acquisition, Loki for aggregation, storage and querying, and Grafana for visualization.

Bye ELK, Hey "PLG" Stack

Looking at the "PLG" stack it's easy to see how the system was influenced by the design of Prometheus.

Promtail is an agent - provided as part of the Loki product - that is responsible for discovering and retrieving log data streams. It functions in a role similar to Prometheus' own "scraper", and its configuration files are syntactically identical to those used by Prometheus. It essentially "tails" the Kubernetes master and pod log files and forwards them on to the core Loki system. (It's important to note that Loki supports many different agents provided by both Grafana and its developer community, which can make migration to a PLG-based solution much easier for users of daemons such as fluentd or logstash.)

Loki, of course, is the heart of the PLG stack and is specifically designed for handling log data. Loki's unique characteristics - which we'll talk about in much more detail in a bit - make it both highly efficient and cost effective at both ingesting and querying log data.

The Grafana dashboard and visualization tool rounds out the "PLG" suite, providing powerful features to enable analysis of application, pod and cluster logs.

How does Loki work under the hood?

Architecture and deployment models

Architecturally, Loki is comprised of five different components:

 • The distributor is a stateless component responsible for acquiring log data and forwarding it to the ingester.  Distributors pre-processes the data, check its validity and ensure that it originates from a configured tenant, which helps the system scale and protects it from potential denial of service attacks. Grafana provides a great explanation here of how Promtail - the recommended distributor agent - processes data.

 • The ingester is the key component in the Loki architecture. Data received from distributors is written by the ingester to a cloud-native long-term storage service. Ingesters also collaborate with queries to return in-memory data in response to read requests.

  • Queriers are responsible for interpreting LogQL query requests and fetching the data either from ingesters or from long-term storage. 

 • The query frontend - an optional component - provides API endpoints that can be used to accelerate read processing. This component optimizes read processing by queuing read requests, splitting large requests into multiple smaller ones, and caching data.

 • Like Prometheus, Loki supports alerting and recording features. These features are implemented in the ruler component, which continually evaluates a set of queries and takes a defined action based on the results, such as sending an alert or pre-computing metrics.

For scalability, all of these components can be distributed across systems as needed.

Loki can be deployed locally in one of two modes:

  • A monolithic mode (the default) which runs all of Loki's binaries in a single process or Docker container. This is a good starting point for learning more about the product.

  • A microservices deployment mode, which allows the Loki components to be distributed across multiple systems and provides high scalability.

An additional local deployment mode, called the "simple scalable" mode, is a good intermediate step when your requirements exceed the monolithic mode capabilities but do not warrant a large-scale microservices deployment.  Of course, if you don't want to manage Loki at all then Grafana Cloud might be the option for you.

Key features

Loki implements some amazing features that are specifically designed to distribute the load, protect the system from attack, and make use of efficient storage mechanisms.  

Labels

Unlike many log processing systems, Loki does not perform full-text indexing on log data. Instead, it leverages a concept borrowed from Prometheus - labels - to extract and tag information from the log data, and then indexes only the labels themselves. This dramatically improves performance on both the write and read path, and - equally valuable in our mind - enables a consistent label taxonomy regardless of input source. 

Since this is such a critical benefit of Loki, let's dig into an example from Loki's documentation. Let's say you have a Loki "scrape configuration" like the one below:

The labels section of this configuration is particularly important. In this section,  the __path__ variable that defines a log file to be read, and the keyword job that defines a label to look for and a value to be used to filter the records.  Using this configuration, the Loki distributor will "tail"  the log file, look in each record  for a variable called job that has a value of syslog, and then create a Loki "stream" of records containing this keyword and value.  The indexes for the job label and the data chunks containing the label and value are then written to persistent storage.

This stream of records can be queried using a simple LogQL query:

{job="syslog"}

When processing the query, the Loki querier component will find the indexes that point to records with a job label of syslog, and then retrieve the records.

Cloud-native backend storage

Because the raw log data itself is not indexed, Loki can improve the system's cost effectiveness by leveraging cloud-native object storage services such as Amazon S3, Amazon DynamoDB or Cassandra as the backend data repository. To improve query processing, Loki uses the cloud service to store the data as "chunks" (the raw log data) and "indexes" (the normalized and indexed labels and data extracted from log records). Queriers use the more efficient indexes to find the requested chunked log data.

LogQL

Loki implements a log query language called LogQL that borrows heavily from Prometheus' PromQL language.  LogQL can be used both directly and via a Grafana front-end dashboard. Having a consistent query language for both logs and metrics flattens the learning curve and facilitates dynamic filtering and transformation.

Installing the "PLG" stack on your Kubernetes cluster

Loki has several installation mechanisms: Tanka (which is used in Grafana's own Cloud deployments), Helm charts for both "simple scalable" and microservices deployments, a mechanism using Docker / Docker Compose, and downloadable binaries.  If desired, you can also download the Loki source code from the Github repository and build the system locally. Grafana provides instructions here for each of these installation methods.

Loki: A better Kubernetes log management solution

Live tailing Kubernetes application, pod and cluster log files is an extremely helpful technique for tracking what's going on with your containerized applications in near real time. Grafana's Loki product takes that to the next level with capabilities inspired by the popular Prometheus metrics system, easy scalability for managing even highly complex environments, and some serious enhancements to make dealing with log files simpler than ever.  If you're looking for a better Kubernetes log management solution, Loki is definitely worth a try.

No items found.
November 7, 2022

8 min read

Explore related posts