Datadog is one of the most powerful observability platforms available. It is also one of the most expensive and it is also one of the hardest to predict costs for. If you have submitted a budget forecast and seen the actual invoice come in 30–40% higher, you are not misconfiguring anything. The pricing model is genuinely structured in a way that makes total cost of ownership difficult to calculate upfront.
This guide breaks down how Datadog charges, which combinations drive the biggest surprises, and why the bill tends to grow faster than your infrastructure does.
The structure: 20+ products, each billed separately
Datadog offers more than 20 separately priced products. Infrastructure monitoring, APM, log management, database monitoring, LLM observability, security, real user monitoring, CI visibility, AI SRE investigations. Each requires a separate purchase and introduces its own usage-based charges.
Within each product, you pay based on usage volume: per host, per GB ingested, per million log events indexed, per million LLM requests monitored. The bill is the sum of all these dimensions, and they interact with each other in ways that are difficult to model before you have real production data.
According to Datadog's own investor presentations, roughly 50% of customers use fewer than a third of available products. That is not a product quality problem. It is a pricing consequence. When activating a new product means a new budget line and a procurement conversation, teams rationally default to using only what they have already paid for.
"Our Datadog bill is a threat, an ever-growing line item that threatens to consume what remains of our cloud spend budget."
From a real engineering team's internal job description for a 'Datadog Whisperer': a role created specifically to reduce Datadog spend.
The five billing dimensions that compound
Understanding the bill means understanding how each major product category charges, and how those charges stack on top of each other.
1. Infrastructure: the base that everything else requires
Infrastructure monitoring charges per host per month: $15 on Pro, $23 on Enterprise. For a 500-node Kubernetes cluster on Enterprise, that is $11,500/month before any logs or traces are sent. Every other Datadog product (APM, database monitoring, network monitoring) requires a compatible infrastructure tier on those same hosts, so the infrastructure bill is a multiplier, not just a line item.
Custom metrics are allotted per host (100 on Pro, 200 on Enterprise). Beyond the allotment, the charge is $1 per 100 additional custom metrics per month, averaged across your entire account. A single service emitting high-cardinality metrics (tagged by user ID, request ID, or pod name) can push your whole account over the allotment without any deliberate change to instrumentation.
2. APM: per host, on top of infrastructure, with ingestion limits
APM costs $31–$40 per host per month in addition to infrastructure (or $36–$47 standalone). Each APM host includes an allotment of 150 GB of ingested spans and 1 million indexed spans per month. Overages are charged at $0.10/GB for additional span ingestion and $1.27–$2.50 per million additional indexed spans, depending on the retention period.
In practice, most teams running Datadog APM enable trace sampling at 25–50% to stay within their ingestion allotment. The consequence is that during an incident, when you need complete trace coverage most, you are working from an incomplete picture.
3. Logs: ingestion, indexing, and retention are three separate charges
Log billing is where the largest surprises occur, because it layers three distinct cost dimensions:
- Ingestion: $0.10/GB of uncompressed data received by Datadog, regardless of whether those logs are indexed or immediately discarded. If 85% of your logs are filtered out after ingestion, you still pay for 100% of the volume that arrived.
- Standard Indexing: $1.70 per million log events, for logs that are searchable and can trigger monitors. Default retention is 15 days. Extending to 30 days requires a plan change; anything longer means a sales conversation.
- Flex Logs: A cheaper storage tier at $0.05/million events with retention up to 15 months. But Flex Logs removes support for monitors and Watchdog Insights. Teams that route logs to Flex to save money lose the ability to alert on those logs in real time, creating a forced choice between cost and coverage.
There is also a rehydration cost that catches teams off guard: when archived logs are pulled back into Datadog for analysis, the charge is $0.10 per compressed GB scanned, not per GB retrieved. If you need to find a specific event in a large archive, you pay for the full scan even if you retrieve only a few lines. This cost arrives at exactly the moment of an incident, when you least want a surprise.
4. The on-demand surcharge
Every Datadog product carries an on-demand rate approximately 50% higher than the annual committed price. Any usage above your committed volume (an incident-driven log spike, a traffic surge, a deployment that briefly inflates host count) is billed at the on-demand rate automatically. The annual committed rate is a floor, not a ceiling.
5. Each new product adds a new unpredictable variable
The product catalog keeps expanding. LLM Observability is now priced at $8 per 10,000 monitored LLM requests per month. The newest AI product, Bits AI SRE Investigations, runs $500/month per 20 investigations, a standalone charge with no bundling into existing contracts. Every new capability that teams want to adopt requires a separate evaluation of what it will cost at their scale.
What the log bill looks like in practice
Why the total is hard to predict
None of the individual rates are secret. They are listed on the pricing page. The unpredictability comes from the fact that the inputs driving each dimension change independently, and not always in ways that are visible before the invoice arrives.
The variables that make Datadog bills hard to forecast:
→ Host count fluctuates daily with Kubernetes auto-scaling
→ Log volume spikes during incidents, precisely when you need it most
→ Custom metric cardinality grows silently when developers add tags
→ APM span volume scales with traffic; overages hit at $0.10/GB
→ LLM request volume grows with every new AI feature shipped
→ On-demand surcharge (~50%) applies automatically to any excess
The compounding effect is the core problem. Teams size their contracts based on current usage. Six months later, infrastructure has grown, a high-cardinality service has been added, an AI feature has shipped, and the bill reflects the product of all those changes multiplied across every dimension simultaneously.
What teams typically do about it
There are three patterns that emerge when teams try to control a growing Datadog bill, and each one trades observability coverage for cost.
The first is filtering and sampling: drop logs below a severity threshold before they reach Datadog, sample traces to 25–50%, and enforce cardinality limits on metric tags. This works, but it means you may not have the log line or trace that explains the next incident.
The second is selective activation: only use the Datadog products already under contract and resist expanding to others. Route lower-priority logs to Flex to reduce indexing costs, accepting that you can no longer alert on them in real time. This is now the modal approach for mature Datadog customers, and it means deliberately using a fraction of the platform you are paying for.
The third is rearchitecting around the data storage model entirely. The root cause of the bill (and the reason the two approaches above are necessary) is that Datadog stores your data in their cloud and charges you a margin on every GB that flows through. Teams evaluating alternatives are increasingly looking at architectures where observability data stays in their own cloud account, eliminating the per-data-unit charge at the source.
A different pricing model
The structural reason Datadog's bill is hard to control is that the business model is built on data volume. The more telemetry you send, the more Datadog charges. That creates a direct misalignment: the platform is most valuable when you send everything, but the pricing penalizes you for doing so.
groundcover is built on a Bring Your Own Cloud (BYOC) architecture that inverts this. Because observability data lives in the customer's own S3, not in groundcover's cloud, there is no per-GB, per-event, or per-request charge. groundcover charges a flat per-node rate that covers the full platform: APM, logs, metrics, distributed traces, LLM observability, and every feature released going forward.
What changes when there is no per-byte charge:
→ 100% of customers use APM (vs. ~25% on Datadog)
→ Log retention defaults to 60–365 days, no sales conversation required
→ All logs are always queryable (no Flex trade-offs, no rehydration costs)
→ Trace sampling is off by default, full production coverage
→ LLM observability and AI features are included, no per-request billing
→ Teams send 5–10x more telemetry data than they did on Datadog
The pricing model determines how much visibility a team actually has into their systems, not just what they could have in theory. A model that charges per data unit creates a structural incentive to send less data. One that charges per node creates an incentive to instrument everything.
Side-by-side: where the cost structure differs
The table below covers the dimensions most likely to produce a surprise on a Datadog invoice. For a full feature and pricing comparison, see the groundcover vs. Datadog guide.
To put this in concrete terms: one team running ~700 Kubernetes nodes with 5 TB/day of logs and 500K custom metrics was paying $2.54M/year on Datadog. The same setup on groundcover cost $297K/year, an 87% reduction, with full APM enabled and no trace sampling.
Next steps
If you want to benchmark your current Datadog spend, the groundcover vs. Datadog comparison guide covers a full feature-by-feature and cost breakdown. If you are actively evaluating a switch, the migration guide walks through data parity, automated dashboard migration, and running both platforms in parallel before committing.
groundcover has a free trial with unlimited seats, all features, full BYOC, so you can run it against production data alongside your existing setup.





