Packet Loss Troubleshooting: Causes, Fixes & Best Practices
In a world where virtually all digital services move data across networks, packet loss - which happens when data fails to reach its destination on a network - is one of the chief causes of performance issues. Unfortunately, packet loss can also be challenging to troubleshoot. There are many potential causes of packet loss, and resolving them is not always simple.
But with the right strategy and tools, it’s possible to get to the root of packet loss problems. Keep reading for guidance as we unpack the ins and outs of packet loss, including common causes of the issue, how to diagnose packet loss, and best practices for preventing packet loss in the first place.
What is packet loss?
.png)
Packet loss is a type of networking problem in which packets fail to reach their intended destination.
To understand this issue in more detail, it helps to step back and talk about how computer networks work. When computers or servers want to send data to each other over a network, they break the data into small units, called packets. Each packet has a destination and return address to identify where it originated and where it is supposed to go. Normally, packets travel across the network from their point of origin to their intended destination.
However, there is no guarantee that packets will reach their destination. For various reasons (which we explore in depth below), they can become lost along the way, resulting in packet loss.
Why is packet loss important?
Typically, lost packets are automatically retransmitted, so the data included in them doesn’t disappear forever. This doesn’t mean, however, that packet loss is not a problem.
On the contrary, packet loss can cause significant network performance issues. Retransmitting packets takes time, so packet loss can contribute to higher rates of latency (meaning delays in the time it takes to send and process requests). Retransmission also increases the load placed on networks, because more data ends up flowing over them. This can further exacerbate latency, especially when network congestion occurs due to the additional packets flowing over network infrastructure.
On top of these issues, there is a chance that retransmission will continually fail, so in some cases, packet loss can cause complete communication breakdowns between two or more network endpoints.
Note, too, that packet loss is a problem for both internal networks (such as those that connect different containers within a Kubernetes cluster) and external networks (like those that allow a server hosted in a cloud to talk to an end-user’s PC located in another part of the world). All types of networks transmit data as packets, and they all experience performance problems when packets are lost in transit.
How to detect packet loss
Various tools are available for monitoring and detecting packet loss.
The simplest way to assess the overall rate of packet loss is to use a utility like ping. This tool sends requests to a given address and asks for a response. In addition to measuring how long it takes to receive the response (which is helpful for assessing network latency), ping can measure how many packets are lost. Traceroute is another useful command prompt tool that can track packet loss. It’s slightly more complex to use than ping, but it reports more granular routing information, showing you which paths the packets are actually taking.
A limitation of tools like ping and traceroute is that they track packet loss only for the packets they directly send and receive, so they won’t tell you how many packets are lost by applications. To gain that visibility, you’d typically use what’s known as a packet sniffer. This is a type of tool that collects packets as they flow to and from an endpoint, allowing you to track the traffic for all applications. On Linux, you can use the CLI tool tcpdump for this purpose. For a packet sniffer with a graphical interface, consider Wireshark.
Finally, various observability platforms (including groundcover) can also measure packet loss across all applications and services. This is the best solution if you want to monitor packet loss as part of a broader observability strategy.
Common causes of packet loss
.png)
Network congestion or overload
If a network is trying to handle more traffic than its available bandwidth will allow, routers may end up dropping some packets because they are overloaded (a dropped packet is one that fails to reach its destination and is abandoned).
Endpoint overload
The destinations to which packets are being transmitted can also become overloaded, causing them to drop packets or ignore packets.
Hardware failure
In some cases, faulty hardware devices (like a malfunctioning Network Interface Card, or NIC) may not receive all packets properly. Hardware failures within network routers can also cause packets to be lost while they are en route between their point of origin and destination.
Malformed packets
Sometimes, packets are malformed at the time of their creation. This can happen as a result of bugs in networking software, as well as I/O issues that cause faulty data to be written into a packet. Malformed packets may fail to reach their destinations because they are missing important data, like the destination’s address.
Bugs in software-defined networking tools
Many networks today don’t transfer packets directly through hardware devices. Instead, they use software-defined networks, which abstract network traffic from the underlying hardware. Software-defined networks provide more flexibility, but they can cause packet loss if bugs exist in the software responsible for managing traffic, such as virtual routers.
Firewalls
Firewalls (software that restricts the flow of network traffic) may cause permanent packet loss by blocking packets from reaching an intended destination. Normally, such blocking is deliberate and is designed to help protect against security risks. But circumstances can arise in which misconfigured or buggy firewalls block packets that should be allowed to flow.
Wireless signal problems
On wireless networks, problems with the radio signals that transmit information may lead to packet loss. Interference by other wireless networks can cause some packets to be lost. Weak signal strength may also cause packet loss.
DDoS attacks
Malicious actors may carry out Distributed-Denial-of-Service (DDoS) attacks in which they deliberately flood a network with illegitimate traffic. This often results in packet loss because legitimate traffic can’t get through.
Step-by-step packet loss troubleshooting guide
Now that we’ve covered the essentials of what packet loss entails and what causes it, let’s look at how to respond to packet loss issues after you’ve detected them.
1. Assess the scope of packet loss
For starters, determine the extent of your packet loss problem by assessing:
- How many applications or services are experiencing packet loss: Is the issue associated with certain apps or types of traffic, or is it a general problem affecting all of your network-connected resources?
- How severe the packet loss is: How many packets are being lost, and how long is it taking to retransmit them? Are any packets experiencing permanent failure (meaning retransmission doesn’t work)?
2. Run active tests and path tracing
To gain additional context about the issue, it’s helpful to run packet tests that trace the network paths of packets. Using tools like traceroute, determine exactly which routes your packets are taking. You can also experiment with different routing table configurations. Routing tables tell the operating system how to route traffic. On Linux, you can modify routes using the ip route command.
3. Capture and analyze packets
You can gain further insight by capturing and analyzing packets using a packet sniffer. This will show you the actual contents of the packets, which can clue you into potential causes of packet loss. For example, in Wireshark, you can set a filter showing only malformed packets to determine how many packets are malformed and whether that is causing packet loss.
4. Correlate packet data with other metrics
In addition to looking at packets themselves, assess other performance metrics, such as the CPU and memory usage of your applications and servers. Spikes in resource consumption may cause packet loss if endpoints become overloaded and can’t transmit traffic normally.
5. Correlate problems and identify root causes
After considering packet loss rates, routing information, packet contents, and other performance metrics, you will likely have enough insight to draw conclusions about the root cause of the problem. For example, if you notice that packet loss surged at the same time that one of your endpoints reached 100 percent CPU utilization, you can assume that the endpoint is most likely dropping packets because it is starved of resources. Or, if packet loss occurs at high rates under a certain routing configuration, it’s a sign that particular routes are causing the problem.
6. Fix packet loss root cause
Once you’ve identified the root cause of packet loss, you can remediate it. For instance, you could move applications from an overloaded endpoint to free up resources, or modify routing tables to improve packet reliability.
Advanced packet loss troubleshooting techniques
We’ve just covered the basics of packet loss troubleshooting. In more complex scenarios, it may also be helpful to consider the following steps:
- Examine logs: Examine available logs that record data about network events and performance, such as syslog on Linux and logs generated by network switches. These may illuminate errors such as hardware problems or network overload.
- Modify MTU settings: Maximum Transmission Unit (MTU) configurations determine how large packet sizes can be. Sometimes, modifying MTU settings can mitigate packet loss issues. On Linux, you can specify an MTU value using the ip link set dev <interface> mtu <value> command.
- Modify your network connection: To rule out issues like hardware failure, consider switching to a different NIC, or using a wired connection (which is typically simpler and has fewer potential points of failure) instead of a wireless one.
- Remove or simplify SDN layers: Similarly, removing or simplifying software-defined networking (SDN) settings may help to diagnose packet loss and identify the root cause by reducing the complexity of your overall network configuration.
Best practices for managing and preventing packet loss
Troubleshooting packet loss is good. Even better, though, is preventing it in the first place. The following best practices can help with that goal.
Maintain hardware health
Since hardware failure is one of the common causes of packet loss, choosing robust hardware and managing it properly helps to prevent the issue. Make sure your endpoints and networking devices are not prone to problems like overheating. Be sure, too, to keep firmware up to date, since buggy firmware on hardware devices can also contribute to hardware problems.
Implement QoS
Quality of Service (QoS) is a feature available in many networking devices that allows admins to prioritize certain types of traffic. For example, they could tell routers to give priority to packets from mission-critical applications, while treating traffic from a test/dev environment as less critical.
QoS won’t necessarily prevent overall packet loss, but it can reduce the risk that high-stakes packets will be dropped. With QoS enabled, any packet loss that occurs is more likely to be limited to lower-priority traffic.
Keep software and firmware up to date
Software and firmware updates may fix bugs that could cause routing problems and packet loss. And while organizations are typically good at updating software, it can be easier to overlook firmware updates, which are often more challenging to apply because it’s not always possible to push out firmware updates automatically or remotely. Still, keeping both software and firmware up-to-date is critical if you want to minimize packet loss.
Continuously monitor the network
You can’t fix what you can’t see, which is why continuous visibility into network performance is an essential element of packet loss mitigation. Using observability software, teams should track the rate of packet loss on an ongoing basis. They should also correlate packet loss metrics with other data, such as resource consumption metrics from endpoints, to help provide context into the cause of packet loss problems.
How groundcover speeds up packet loss troubleshooting
As a comprehensive observability solution, groundcover can help you identify and resolve packet loss challenges fast.
.png)
That’s particularly true because - unlike most observability suites - groundcover doesn’t just monitor network performance using software agents that operate on endpoints. Instead, groundcover leverages eBPF, a Linux kernel framework, to inspect every single packet in real time as it flows across network interfaces. This approach provides deep visibility into packet transmission rates. It also offers the advantage of being extremely resource-efficient, because eBPF programs place very low load on endpoints.
Don’t be at a loss when combating packet loss
Minimizing packet loss is one of the single most effective steps admins can take to optimize the performance of virtually any type of workload. The more efficiently endpoints and applications can move data over the network, the faster they’ll operate, and the fewer resources they’ll consume. Hence the value of investing in practices and tools that provide visibility into packet loss events and help to resolve them quickly.
FAQ
How do I tell packet loss troubleshooting apart from a pure latency or jitter issue in production?
While packet loss is one potential cause of network latency (meaning delays in the delivery of data over a network) or jitter (or fluctuations in transmit times for network data), these can also result from problems like network congestion or malformed packets.
To determine whether packet loss is the root cause, examine whether packets are actually being dropped and retransmitted. You can do this using tools like ping and traceroute to measure overall rates of packet loss. You can also use software such as Wireshark to inspect individual packets as they flow over the network to see if they are lost en route.
What packet loss thresholds make sense for SLAs, and how should I monitor them continuously?
As a rule of thumb, most Service Level Agreements (SLAs) aim to keep packet loss rates below 1 percent. However, for services where packet loss is especially disruptive (like real-time video streaming), SLAs might aim for thresholds of 0.1 percent or lower. To monitor whether you’re complying with packet loss thresholds specified in SLAs, you need to monitor packet loss rates in real time and on a continuous basis using network observability software.
How does groundcover’s eBPF-based visibility plug into my existing monitoring to accelerate packet loss troubleshooting?
Using eBPF, groundcover inspects packets as they flow over the network interfaces of endpoints. This makes it possible to capture every packet in real time and determine whether packets are lost between their points of origin and destination. You can combine this information with data from other tools, like packet analysis software, to gain additional context on packet contents and potential causes of packet loss.





