Every observability bill review goes the same way. The number went up again, and everyone argues about cutting retention or cutting cardinality. Nobody wants to say the obvious thing: we're paying to collect a pile of data nobody ever looks at.
And the numbers are kind of wild. Around 70% of the logs you store never get queried. You're paying to keep telemetry that no human and no alert is ever going to read. For a lot of teams observability is now 11% to 20% of cloud spend, and it's growing faster than the infra it's supposed to be watching.
Then AI made it worse. An LLM or agent workload puts out up to 50x the telemetry of a normal service, and the teams running them are watching their bills jump 40% to 200%. So this isn't a problem that fixes itself. It gets worse over time, and it gets worse fastest for the teams doing the most interesting work.
The cheapest place to cut your bill is also the place most tools steer you away from. That's what this post is about.
The default answer: drop it at the backend
Most platforms already sell you a way to deal with this on their end. You send them everything, and they drop, sample, or trim it once it lands, before they store it. Some do it for metrics, some for logs, some by tail-sampling your traces. It works. It'll bring your bill down.
But notice where the drop happens: on ingest, inside the platform, after the data has already crossed the network. This is the problem.
Why the backend is the wrong layer
By the time the backend drops your telemetry, you've already paid for most of it. The hosts generated it, you paid egress to push it out of your network, and if your collector and backend sit in different regions or clouds you paid for that hop on top. The bytes made the full trip, landed, and then got deleted. Dropping them there saves you indexing and storage. The transport was already spent.
Filtering or sampling after ingestion is the clearest case. Decide at the backend and you're paying full price to move 100% of your traces just to keep maybe 10% of them.
There's a second cost too, one that doesn't show up on the invoice: lock-in. When your drop logic lives inside one vendor's platform, those rules don't go anywhere. The day you want to add a second backend, or split telemetry across two vendors, or move a tier in-house, you're rebuilding your whole cost-control setup in someone else's format. You cut your bill by tying yourself tighter to the one vendor whose bill you were trying to cut in the first place.
The cheaper layer, and the one that sticks around, is the one you already run: the Collector.
Emit broadly, trim in the pipeline, store on purpose
The Collector sits in front of every byte of egress and ingest you get charged for. Anything you drop there is something you never pay to move, ingest, index, or store, no matter which backend you're on. Filtering at the edge has been measured cutting egress by as much as 90%.
The two knobs at the edge
You've got two ways to cut volume in the Collector, and they do different jobs. The first deletes the data you already know is worthless. The second keeps a smart sample of everything else. Most teams want both.
Knob one: the filterprocessor
Some telemetry is pure noise, and you already know it: health checks firing every second, debug spans from a service nobody watches, the 200 OK on your busiest endpoint that fires ten thousand times a minute and tells you nothing. You don't want a sample of that. You want it gone.
That's the filterprocessor. You hand it a rule, and anything that matches gets dropped inside the Collector. The rules are written in OTTL, the small expression language built into the Collector for matching on telemetry. And you're not stuck matching on names. A rule can check any field: an attribute, a status code, even how long a span took.
Drop your health-check spans:
processors:
filter/ottl:
traces:
span:
- 'attributes["http.route"] == "/healthz"'
- 'attributes["http.route"] == "/readyz"'
- 'attributes["http.route"] == "/livez"'
Keep only the services you care about and drop everything else:
processors:
filter/ottl:
traces:
span:
- |
resource.attributes["service.name"] != "checkout" and
resource.attributes["service.name"] != "payments" and
resource.attributes["service.name"] != "cart"
Drop spans that finished fast and succeeded, usually your single biggest line item and your least useful data:
processors:
filter/ottl:
traces:
span:
- 'name == "/api/v1/users" and status.code == STATUS_CODE_OK'
- '(end_time_unix_nano - start_time_unix_nano) < 10000000' # < 10ms
Knob two: tailsamplingprocessor
Once the junk is gone, you're left with real traffic, and you still don't need all of it. A thousand identical successful checkouts tell you about as much as ten. The trick is picking which ones to keep. Drop them at random and you'll throw away the one trace that errored right next to the boring ones, and that errored one is exactly what you'll come looking for at 2am.
Tail sampling fixes that by waiting. Simple sampling decides at the very start of a trace, before it knows whether anything went wrong. The tailsamplingprocessor instead holds each trace for a few seconds until it finishes, looks at the whole thing, and then decides. So your rules can match what you actually care about: keep every trace that errored, keep anything slower than your latency bar, and keep only a few percent of the fast, successful ones. You hold on to the traces worth looking at and let the rest go.
One thing to know going in: tail sampling has to see all of a trace's spans together to judge it, so it runs on a central collector that your traces pass through, not on every host. A little more to set up, and in return you get a single dial for trace volume. Turn it down to save money, or up when you need to see everything.
Drop the noise, sample the rest. Between them, that's the whole pitch for cutting cost at the edge, and it's a good one. Leaner pipelines, smaller bills, faster queries, nothing locking you in.
Turning on the firehose
You run lean most of the time and you get a bill you're finally okay with. Then something breaks at 2am, and lean is the last thing you want. You want everything, right now!
Now you might be asking: how do I do that on the fly, across a whole fleet of collectors, without hand-editing YAML and redeploying in the middle of an incident? This is where OpAMP comes in, and it's the key piece.
OpAMP, the Open Agent Management Protocol, is the OpenTelemetry standard for driving a fleet of collectors from one place: update and push config from a central place to your whole fleet. The server and supervisor tooling is real and usable today, and it's been coming together fast.
That means you can adjust both those knobs we were talking about with the press of a button. Simply update your YAML, hit apply, and boom! All your hundreds, or thousands, or hundreds of thousands of collectors updated instantly to reflect this change:
- The filter knob. Push a config that relaxes or removes the filter rules for the affected services. The 200 OKs and the sub-10ms spans you were throwing away start flowing again, for those services, not the whole fleet.
- The tail-sampling knob. Push a policy that sets the affected services to keep 100%. The sample rate jumps from a few percent to everything, at once.
One action, scoped to the services that matter, applied across every collector at the same time. You get full-fidelity telemetry for the rest of the incident, and incidents are rarely over in a minute, so "the rest of it" is usually most of it.
Remember to put the flip on a timer though, or wire it to the alert, so it reverts on its own. A firehose nobody remembers to close is just going to hike up your bill again.
Why now
A few things lined up this year.
On May 21, 2026, the CNCF announced that OpenTelemetry graduated. The number that actually matters there isn't the milestone itself. It's that OTel is now the second most active project in the entire CNCF, behind only Kubernetes, with more than 12,000 contributors from over 2,800 companies. Graduating is the CNCF saying this is stable and you can build a company on it. Which means the argument for putting your cost-control layer inside one vendor's format is a lot weaker than it used to be.
The management layer is catching up fast. A year ago, driving a fleet of collectors from one place pretty much meant buying a vendor's product. Now OpAMP does it on open parts, and it's quickly becoming the default way teams think about running a fleet at all.
And those same AI workloads blowing up everyone's telemetry are going to get instrumented with OpenTelemetry anyway. So the volume problem and the standard that handles it are showing up at the same time. Cutting cost at the edge is both urgent, because the bill is going through the roof, and a safe bet, because you're building on the layer that won.
Everything above is just OpenTelemetry, which also means it's yours to build. You can stand up the collectors, write and maintain the OTTL, run tail-sampling, operate an OpAMP server, and build the tooling to achieve use cases like the one discussed above. All of it is doable with open components, but it's also a real amount of work to build and keep running.
That's what we're building Telflo to be: the management plane for this kind of setup, so you don't have to assemble it yourself. Telflo is fully OTel-native and it's built on OpAMP to give you a completely vendor-neutral way to manage your fleet of collectors. Now in beta!