By storing data in memory, Redis key-value stores can read and write data much faster than databases that depend on conventional storage. That's part of the reason why many of the world's largest tech companies – such as Twitter, Snapchat and Craigslist – depend on Redis Servers, and clouds like GCP and Azure offer hosted data stores that use Redis protocol.

Unfortunately, though, Redis key-value stores don't always work the way they should. You may run into issues like slow performance due to low hit rates and poorly sharded data. Problems like these must be identified and fixed, otherwise , what's the point of paying for an in-memory key-value store if it's not living up to its full potential?

Monitoring Redis databases in order to troubleshoot performance and other problems isn't always as straightforward as you might like. But it's possible to do – especially with the help of tools like eBPF, which makes it possible to gather Redis monitoring insights in ways that would simply not have been possible using traditional approaches to Redis monitoring.

Redis 101

Redis is a single-threaded, high-throughput, low-latency, in-memory key-value store. That's a long (and overly hyphenated) way of saying that Redis uses in-memory data storage to deliver performance that’s hard to achieve using conventional databases.

If you're into databases and data structures, you might also enjoy knowing that Redis supports multiple types of data structures – including hashmaps, lists, sets, counters and more. That makes it a very flexible key-value store that is suited for many use cases, such as caching, pub/sub patterns, text search, graphs, atomic operations and rate limits, just to name a few.

On top of all of this, Redis also supports atomic operations, either through the use of transactions or by evaluating custom Lua scripts on requests. 

To enhance performance and resiliency, Redis supports sharding, although that comes at the cost of losing atomicity (since single commands can't run across shards). 

Alternatively, you can operate using a master/replica model that allows you to run multiple Redis nodes (each with a full copy of the data), achieving the benefits of more compute and memory resources without losing atomicity.

Master/Replicas vs. Sharding

Oh, and in case you're wondering what happens to your in-memory data if your nodes shut down unexpectedly, Redis has a solution for that: You can configure persistent storage using the fsync feature, which syncs data to persistent files as backups.

Common Redis issues

Although Redis can do lots of cool things, it can also run into a lot of problems – just like any database.

Low hit rate

One common issue is what's known as low hit rate. This can cause poor performance on your Redis server due to TTL misses. You can check hit rate using the Redis CLI INFO command:

Response

With this data, you can calculate a hit rate for all the keys on your Redis server.

Large values

Large values in sorted sets, lists and hashes can trigger problems like incorrect cleanup logic or missing TTLs. To find the keys with the biggest values in your Redis storesha, run:

Large JSON keys

Using large JSON keys instead of Redis hashes is another common Redis issue. It happens when you use a single key to hold a JSON value as a string, causing lookups in your apps to be very inefficient.

A simple solution is to hold the data in a hash so you get a full lookup using a single field in O(1) complexity.

Using lists instead of sets

It's easy to use lists in Redis via push/pop commands, but overuse of lists can lead to duplicate values. To avoid this issue, use sets instead of lists after you identify an unexpected value size.

Redis Cluster Issues

Beyond the common Redis issues we outlined some of them above, you may also run into more complex  redis cluster issues.

Poorly sharded data

Redis clusters spread their data across many nodes. When you use a Redis cluster with a general-purpose hash instead of using multiple keys, your cluster can suffer  a performance hit. This happens because the key is stored on a single node, and in a high-scale environment, the pressure will fall on that node instead of being distributed between all of the nodes in the cluster. The result is that the node becomes a performance bottleneck.

As a real-world example, consider a cluster that stores user data in a hash, where the key is the user ID. An authentication server that performs a lot of lookups on the user ID will place heavy pressure on the node that stores the key. A solution would be to spread the hashed data to multiple keys across nodes, letting Redis's sharding algorithm distribute the pressure.

MOVED errors

When performing multi-key operations in a single command – such as MGET, pipelines and Lua script Evals – it’s very easy to forget that Redis hashes on every key and decides which shard in the cluster to place its value. This behavior raises the possibility of  a MOVED error. The MOVED error is a response returned by one of the nodes telling you that the data is stored on a different node, and that it's the client’s responsibility to go to the relevant node and ask for the data.

For example, consider this code:

Here, we perform pipelined requests on three individual keys. The requests execute on a single node, and if one of the keys is not on that specific node, the commands with the keys that are on that node will return a response while the others will return the MOVED error.

A quick fix is to use hashtags in the key structure, which means simply adding curly brackets around the part of the key that we want to hash by will cause the sharding algorithm to direct the values to the same node:

Multiple set/get operations

Since Redis executes commands on a single thread, it provides atomicity when executing a command – or at least, it should. But imagine a more complex scenario, where you’re using the Exists function to check if a key is set. If it does, we increment a counter:

In this instance, we lost atomicity. Our requests are separate, and by the time we get to the second command, the key that we were checking might not exist anymore.

We can solve that using a temporary Lua script that will ensure atomicity since the script is evaluated as a single request:

We could also store our Lua script for future use using Redis's SCRIPT LOAD command, which stores the script on the Redis node and lets you trigger it by its SHA hash with the EVAL SHA command:

How to monitor Redis Performance

Clearly, Redis issues come in many shapes and sizes and their solutions are equally varied. That's why monitoring your Redis clusters in order to detect performance issues can be critical to your success.

Monitoring Best practices:

Redis provides a few built-in commands to extract redis metrics:

  • Info/cluster-info: Shows you raw information/counters about your Redis server/cluster, such as memory, CPU, shards and hits. The output of this command is designed to be easy to parse programmatically, so you can export it to Prometheus or your preferred monitoring tool.
  • Monitor: Shows all the commands that are being executed on the server at the time the command runs. It is a great tool to find out what’s going on, but in a real production environment it's very difficult to correlate the command with a particular issue.
  • Slowlog: Shows the slowest commands on the server, which can help you find commands that should be optimized in order to improve performance.

If you want to track the output of these commands in real time without having to run the commands manually, you can configure the Grafana Redis data source with the Redis dashboard, which displays data based on these four commands.

Hosted redis solutions:
If you are using RedisLabs Cloud,Amazon ElastiCache, Google MemoryStore, Microsoft Azure Cache for Redis or any other hosted redis solution you can use the cloud provider’s exposed metrics to get cache hit ratio/cpu/memory/evictions and more.  

Filling in the blanks: Getting more from Redis monitoring with eBPF

By running an ePBF program on a container/server/client we can detect network data such as requests ,responses, sources, destinations,dns resolutions, request/response size and a whole lot more. We can also detect process data such as stack trace or exceptions.

Let's emulate a redis request and response according to redis protocol spec.

Redis request response example:

We can then aggregate that data using the sources/destinations/response statuses/commands, and calculate success/error ratios, latency percentiles and so much more.

eBPF helps you to get more context - it allows you to run code in the Linux kernel on your Redis nodes to get low-level data about node performance and activity. Monitoring Redis clients using eBPF gives you the ability to transform every request into contextual data, and, with the right tools, into throughput and hit rate metrics, without compromising performance and without any code changes.

That means that you can, for example, track client requests to get the time, success/error, callers, commands, arguments and beyond:

Screen capture source: app.groundcover.com 

 We can then correlate the caller application’s data and create a span (an event that occurred during a timespan):

Once we have the span, we can enrich our span with the caller’s stack trace and pinpoint the exact call:

With eBPF based tools, you can get so much more than what basic monitoring can give you. You can build and extract meaningful details and contextualized data about your Redis nodes, track down any and all performance issues, and get an inside look into what’s happening in your cluster, all without writing a single line of code. Instead of working hard to add full tracing and monitoring to every part of your system, you can sit back, relax, and enjoy the benefits of Redis’s high-performance in-memory store without worrying about missing a single issue.

Have any questions I didn't cover here? Reach out on groundcover's redis channel and ask away!

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.