K8s logging at scale: from kubectl logs to the PLG stack
Find out how viewing log output in real time using ad-hoc tools like the kubectl logs –tail command gives you an easy way to see what's going on in your system during the development process, and what you need to consider when choosing a solution for real production-scale kubernetes environments.
Regardless of how long you've been working with Kubernetes-based systems, there's one problem determination technique that we can guarantee is in your tool kit: log messages.
During the development process, viewing log output in real time using ad-hoc tools like the kubectl logs –tail command gives you an easy way to see what's going on in your system.
When you deploy your container into a production environment, however, maintaining that level of observability can become a burden as you try to manage and analyze large volumes of messages from multiple containerized applications executing in multiple pods. In modern cloud-native environments. one must collect, understand, and investigate millions of logs from different sources to understand what’s happening at an application’s runtime.
Legacy logging solutions simply can’t keep up with the complex, distributed infrastructure nature of modern product environments, making finding a simple and performant solution to help you manage this complexity a key to your ongoing Kubernetes logging management efforts. In - Loki by Grafana.
Enter Grafana Loki
You almost certainly have heard of Grafana, the company that has made its mark with open-source software that enables easy visualization of data from many different sources. In the Kubernetes world, Grafana may be best known for the metrics visualization component of the Prometheus-based cluster metrics solution.
But things are changing and more recently, Grafana has been evolving into a full-blown observability vendor in its own right, with new projects such as Loki, Mimir, and Tempo addressing the key observability requirements for logging, tracing and metrics.
The Loki project in particular is squarely focused on the challenge of managing distributed, high-volume, high-velocity log data with a cloud-native architecture inspired by Prometheus (and in fact Loki touts itself as "like Prometheus, but for logs").
Loki is equipped with many advantages making it a great fit for the challenges of modern environments.
It's simple to set up and easy to operate, It only indexes metadata instead of the full log messages making it light-weight, it works well together with other cloud-native tools such as Kubernetes and it uses common object storage solutions like Amazon S3.
Available as either a self-managed open source version or a fully-managed service provided by Grafana Cloud, Loki forms the foundation of what is known as the "PLG" stack: Promtail for log stream acquisition, Loki for aggregation, storage and querying, and Grafana for visualization.
Bye ELK, Hey "PLG" Stack
Looking at the "PLG" stack it's easy to see how the system was influenced by the design of Prometheus.
Promtail is an agent - provided as part of the Loki product - that is responsible for discovering and retrieving log data streams. It functions in a role similar to Prometheus' own "scraper", and its configuration files are syntactically identical to those used by Prometheus. It essentially "tails" the Kubernetes master and pod log files and forwards them on to the core Loki system. (It's important to note that Loki supports many different agents provided by both Grafana and its developer community, which can make migration to a PLG-based solution much easier for users of daemons such as fluentd or logstash.)
Loki, of course, is the heart of the PLG stack and is specifically designed for handling log data. Loki's unique characteristics - which we'll talk about in much more detail in a bit - make it both highly efficient and cost effective at both ingesting and querying log data.
The Grafana dashboard and visualization tool rounds out the "PLG" suite, providing powerful features to enable analysis of application, pod and cluster logs.
How does Loki work under the hood?
Architecture and deployment models
Architecturally, Loki is comprised of five different components:
• The distributor is a stateless component responsible for acquiring log data and forwarding it to the ingester. Distributors pre-processes the data, check its validity and ensure that it originates from a configured tenant, which helps the system scale and protects it from potential denial of service attacks. Grafana provides a great explanation here of how Promtail - the recommended distributor agent - processes data.
• The ingester is the key component in the Loki architecture. Data received from distributors is written by the ingester to a cloud-native long-term storage service. Ingesters also collaborate with queries to return in-memory data in response to read requests.
• Queriers are responsible for interpreting LogQL query requests and fetching the data either from ingesters or from long-term storage.
• The query frontend - an optional component - provides API endpoints that can be used to accelerate read processing. This component optimizes read processing by queuing read requests, splitting large requests into multiple smaller ones, and caching data.
• Like Prometheus, Loki supports alerting and recording features. These features are implemented in the ruler component, which continually evaluates a set of queries and takes a defined action based on the results, such as sending an alert or pre-computing metrics.
For scalability, all of these components can be distributed across systems as needed.
Loki can be deployed locally in one of two modes:
• A monolithic mode (the default) which runs all of Loki's binaries in a single process or Docker container. This is a good starting point for learning more about the product.
• A microservices deployment mode, which allows the Loki components to be distributed across multiple systems and provides high scalability.
An additional local deployment mode, called the "simple scalable" mode, is a good intermediate step when your requirements exceed the monolithic mode capabilities but do not warrant a large-scale microservices deployment. Of course, if you don't want to manage Loki at all then Grafana Cloud might be the option for you.
Loki implements some amazing features that are specifically designed to distribute the load, protect the system from attack, and make use of efficient storage mechanisms.
Unlike many log processing systems, Loki does not perform full-text indexing on log data. Instead, it leverages a concept borrowed from Prometheus - labels - to extract and tag information from the log data, and then indexes only the labels themselves. This dramatically improves performance on both the write and read path, and - equally valuable in our mind - enables a consistent label taxonomy regardless of input source.
Since this is such a critical benefit of Loki, let's dig into an example from Loki's documentation. Let's say you have a Loki "scrape configuration" like the one below: