AWS calls it a Global Strategic Partnership. I call it a racket.

Sam Workman

When you turn on a 3rd-party monitoring service like Datadog, your AWS bill goes up - likely by about 10%. I would expect monitoring to add some overhead, but I was surprised to learn the 10% number holds whether or not telemetry agents are even installed. Smells pretty fishy to me, but before we get out the pitchforks, let’s take a closer look at where the costs are coming from, figure out what we can do about them, and then adjust our outrage accordingly.

Metrics with benefits

Along with logs and traces, metrics are the primary means of monitoring cloud infrastructure. Unlike logs or traces, they provide a particularly useful numeric signal that can trigger automated alarms and be easily graphed for manual inspection. The process of collecting metrics is called telemetry, and in AWS, it’s not optional. Metrics are automatically emitted and stored for most AWS services. And rightfully so - they’re the literal vital signs of your workloads. By treating metrics as essential 1st-party features, AWS is able to optimize the cost of telemetry, bake these costs into their services, and build features like load balancing and auto-scaling on top of them. As an added benefit, manual monitoring can be provided at no additional cost via the CloudWatch UI, making services much more usable.

Win-win.

What about automated monitoring? Well, AWS CloudWatch does offer a suite of paid tooling, but you’re welcome to use a 3rd-party solution or even roll your own. You just need the metric data. It's worth mentioning that AWS controls access to the only source of metrics for the large majority of AWS services (compute being the exception - more on that later).

Hmm, that kinda sounds like a conflict of interest, but I doubt AWS would take advantage of their customers. Even if they were tempted, there are laws to prevent anti-competitive behavior.

Metric monopoly

Let's run a little thought experiment. You work for AWS. You've been asked to exploit your monopoly on cloud usage data, punishing competition, and propping up the price of your own monitoring solutions. You'd have to be careful. It would be hard to defend a special pricing model for metrics. You'd much prefer to use an existing precedent. So what are your options?

There's the data transfer out cost, that's a start. You quickly realize, however, that metrics only generate a modest amount of data. Each datapoint is just a timestamp and a number, and data is rolled up by the minute before it's emitted. Even using uncompressed JSON, you're looking at about 1.5 MB per metric per month. With an existing data transfer rate of $0.09 per GB, you'd need about 7,400 metrics to make a dollar. You'll have to do better.

Maybe you could send the metrics to an S3 bucket and charge for storage? But then users would own the data, and again the low data volumes are not going to move the needle.

There's some precedent for charging data retrieval queries by the amount of data scanned, but no, dammit, that runs into the same issue as data transfer and storage.

That really only leaves API requests. AWS has an established rate of $0.01 per thousand requests. It doesn't sound promising, but you run the numbers anyway. Since monitoring tools need metric data in real-time, they typically ask for 10 minutes worth of data every 10 minutes. If your data model was indexed and/or partitioned by time, a single cheap query could return all metrics for a particular time interval, with optional filtering to select metrics of interest. Users could make requests like, "Show me all blocked requests for my web application firewall for the last 3 minutes" and get back results like:

{
  "Timestamps": [
    "2025-11-14T22:28:00+00:00",
    "2025-11-14T22:27:00+00:00",
    "2025-11-14T22:26:00+00:00"
  ],
  "Metrics": [
    {
      "Namespace": "AWS/WAFV2",
      "MetricName": "BlockedRequests",
      "DimensionNames": ["WebACL","Region","Country"],
      "TimeSeries": [
        {
          "DimensionValues": ["Prod","us-west-2","NL"],
          "Stat": "Sum",
          "Values": [1.0,2.0,2.0]
        },
        {
          "DimensionValues": ["Prod","us-west-2","US"],
          "Stat": "Sum",
          "Values": ["-",2.0,"-"]
        },
        {
          "DimensionValues": ["Prod","us-west-2","JP"],
          "Stat": "Sum",
          "Values": [1.0,1.0,"-"]
        }
      ]
    }
  ]
}

To be consistent with pagination in other parts of the API, you’d need to serve around 500 sets of metric data per request. A batch of paginated requests every 10 minutes works out to about 4,400 batches a month. 10,000 metrics would result in 20 requests per batch (88,000 requests per month) and cost $0.88. Another dead end.

But wait a minute! No one said you had to build the right API for the job. What if your API required a request per metric, regardless of the size of the time interval? And not just a request per metric name, but a request per unique set of metric dimensions? That could probably be justified. It’s not the main use case, but a user might want the entire history of a single metric to run some custom analysis. Effectively, you’ve just added a 500x multiplier to the number of requests needed for monitoring use cases, taking the cost of 10,000 metrics from $0.88 to $440.00 per month.

Fictional you deserves a whale of a bonus.

Aggressive monetization

The real people of AWS seem to have had a similar realization, because the single metric API is the one they built. It’s called GetMetricStatistics. Also, it’s deprecated. Once the baseline cost of metric data had been anchored in existing precedent, AWS was free to optimize.

All those requests were creating some unnecessary overhead. The newer GetMetricData API takes up to 500 metric definitions in a single request. You pay the same rate, but now per metric instead of per request, so the cost to you stays the same. Actually, the cost went up. The special treatment allowed AWS to remove the one million request free tier that applies to the rest of the API. They also made a slight change to the definition of a metric, counting each statistic as its own metric (up to a 5x multiplier, depending on the metric).

AWS provides a second option for metric data. Metric streams can push data to destinations via AWS Kinesis instead of pulling from GetMetricData. Since the data has already been collected and stored in CloudWatch, you might expect a pricing model that mirrors Kinesis streams. Then no one would use the API, silly. So instead, we get pricing anchored on the metrics API. Streams are billed at $0.003 per 1,000 metric updates, and are often touted by 3rd-party monitoring services as cheaper than the API. While technically correct per update, most metrics you care about emit every minute, so the same 10,000 metrics sent over a stream costs $1,320 / month. The tradeoff is that you get data every minute instead of every 10 minutes, and with a latency of only 2-3 minutes.

An abundance of metrics

So how many metrics does AWS collect for you? It varies depending on the specifics of your architecture, but a good rule of thumb seems to be about three metrics per dollar of your monthly bill.

Let's say you turn on a web application firewall. You might be surprised to learn that the number of metrics collected depends almost entirely on your app's traffic, and could easily be in the thousands. For instance, BlockedRequests and AllowedRequests are collected for every unique country code that hits the firewall. Anything hosted on the internet will have 100+ metrics from these two types alone.

Also, many of the best practices that AWS self-publishes add a multiplier to your metric count. Maybe you split out your development, staging, and production environments into their own accounts. Maybe you use an organizational unit for each of your teams. Maybe you host in multiple regions. Maybe you use 3 availability zones per region. Maybe you have 10 microservices. You've just created a combinatorial explosion of metrics.

In general, these metrics are useful. High cardinality can decrease incident response time and generate genuine insights. But given a blank check, do we really want AWS judging how many thousands of metrics to mint?

Datadog's role

If the AWS pricing model is as anti-competitive as I claim, why isn't there more pushback from the major 3rd-party monitoring providers? And why don't they do more to keep your AWS costs down?

Well, first of all, AWS has these providers in a balancing act. Neither side wants to draw attention to the costs, which are currently buried in an obscure CW:GMD-Metrics line item deep within your AWS bill. Any effort to filter metrics not only calls out these costs, but also creates onboarding friction and potentially omits some key signal. Datadog's default integration walks the line by excluding three particularly metric-heavy namespaces, presumably because there's a limit to how much your bill can go up before you notice.

That doesn't explain the lack of effort behind the scenes though. Datadog could support more comprehensive filtering for users that want it. They could skip metrics that can be calculated from other metrics. They could add options to poll less-important metrics at longer intervals, and even be opinionated about this in the default setup.

The game becomes clearer when you realize that every one of the major monitoring providers is an AWS partner. Datadog, New Relic, Dynatrace, Splunk, Elastic, Sumo Logic - all of them receive customer referrals from AWS. Which one of them gets the most business? I don't know, maybe the one that makes AWS the most money. The Global Strategic Partnership between AWS and Datadog allows enterprise customers to count their Datadog bill towards their AWS volume discount commitments. What else has been inked in these deals?

So what can you do?

First, figure out how much you’re spending. In your Cost Explorer console filters, choose CloudWatch for Service, and under Usage type, search for Metric. API charges will end in GMD-Metrics, while streaming charges will end in MetricStreamUsage.

Next, figure out your largest sources of metrics. Datadog uses list-metrics to decide what to pull. Run the following command to get a count of your metrics by namespace (be sure to include the recently active flag):


aws cloudwatch list-metrics \
  --query 'Metrics[].Namespace' \
  --output text \
  --recently-active PT3H \
| tr '\t' '\n' \
| sort \
| uniq -c \
| sort -nr

These numbers will vary throughout the day due to usage patterns, so run this at a few different times. Cross-check the largest namespaces against the CloudWatch console to see what's actually being collected, then decide if the costs are warranted for that particular service. Remember, 10k metrics cost $440 per month.

Unfortunately, Datadog doesn't allow filtering by metric name, so for most namespaces, it's all or nothing per region. However, for ApplicationELB, RDS, SQS, and States (Step Functions), Datadog implements its own tag filtering for any metric that has an ID in its dimensions. Consider turning this on if you collect from these namespaces.

For a few compute namespaces, you can rely on Datadog agents. EC2, ECS, EKS, Lambda, and Batch all have some form of Datadog telemetry, and you'll be charged per host/container/function whether you collect metrics via CloudWatch or directly via Datadog. By excluding these namespaces from CloudWatch collection and only installing Datadog agents/sidecars/layers for the instances you care about, you can reduce your Datadog bill as well, since this is your billing meter on the Datadog side.

For any other namespace that you can't exclude entirely, it might make sense to switch to metric streams. AWS lets you filter streams by metric name, so even though they cost about 3x, you'll break even if you can identify 1/3 of the metrics that you actually care about.

Finally, if you've set up metric streams for critical services (or if for some reason you don't care about latency for any of your monitoring), you can open a service ticket with Datadog and ask them to reduce your polling interval to 30 minutes. As far as I know, that's the longest interval they support, and it will apply to your entire org.

Here's a Terraform example that implements these recs to save you some ClickOps:


resource "datadog_integration_aws_account" "prod" {
  aws_account_id = "123456654321"
  aws_partition  = "aws"

  aws_regions {
    include_only = [
      "us-west-2",
    ]
  }

  auth_config {
    aws_auth_config_role {
      role_name = "DatadogIntegrationRole"
    }
  }

  metrics_config {
    namespace_filters {
      include_only = [
        "AWS/RDS",
        "AWS/ApplicationELB",
      ]
    }
    tag_filters {
      namespace = "AWS/RDS"
      tags      = ["datadog:true"]
    }
    tag_filters {
      namespace = "AWS/ApplicationELB"
      tags      = ["datadog:true"]
    }
  }

  resources_config {}

  traces_config {
    xray_services {}
  }

  logs_config {
    lambda_forwarder {}
  }
}

          

If you do all that, you'll stop adding a 10% tax to your AWS bill. Of course, you’ll still be paying 500x more than you should be, and there's always a risk that you excluded something important.

Torches and pitchforks it is.

Don't have time to fight AWS?

Put Stackadilly in the ring. We help teams save money and get the most out of their AWS infrastructure.

Talk to an AWS expert today

© 2025 Stackadilly