Orderly API Evolution: How to Break APIs Without Breaking Trust

I spent my baby bonding leave this summer with my newest kid — which was the point. But between feedings and naps, I also found myself trying to deeply understand coding agents and the new developer landscape. I wanted to see where this was all heading.

Building MCP servers was one of the experiments that surfaced something I hadn’t fully appreciated: MCP servers have wildly different audiences with wildly different expectations.

Consider the use cases:

An end user uses a chatbot hooked up to an MCP server to accomplish some task
A developer uses their coding agent hooked up to an MCP server to help them build something
A developer builds an agent that uses an MCP server to accomplish something useful, potentially entirely under the hood
A developer uses an MCP server’s tools directly as an API, bypassing the agent layer entirely

Not all of these may be good ideas, but Hyrum’s Law applies: with enough users, all observable behaviors will be depended on by somebody. That’s an overloaded interface. And when you start thinking about versioning an overloaded interface, suddenly questions like “what constitutes a breaking change?” get a lot messier.

At Firebase, I built and ran our API Council — reviewing proposals for changes across thousands of APIs spanning SDKs, REST endpoints, gRPC services, CLI commands, database schemas. Over 5.5 years, I shepherded about 850 API proposals through the process, looking at ease of use, consistency, voice, and especially how changes would ripple through our developer ecosystem. We developed frameworks for thinking about graceful evolution at scale.

Agentic coding is forcing me to expand and re-evaluate those frameworks.

But here’s what I do know: APIs need to evolve. The question isn’t whether to break things, but how and when to break them.

I’ve started calling this “Orderly API Evolution” — and I think we need better vocabulary and frameworks for it, because most teams are stuck arguing about SemVer when the real problem is much messier and more human.

The SemVer Trap

Here’s what trips up most platform builders: treating semantic versioning like a technical solution when it’s fundamentally about communicating something to your developer customers and their tools.

You can write tests to ensure your API surface doesn’t change in breaking ways per SemVer. Many teams do. It feels rigorous. It feels safe.

But SemVer has parts that are clear and unambiguous, and parts that are totally fuzzy. Consider:

What happens if your dependency gets a new major version? Does that make your update a major version change? I’ve generally landed on “it is if the dependency is actually part of your API, otherwise not” — but that requires judgment about what constitutes your API surface.
What if part of your API is technically public, but documented as internal-only? Many of the Sun Java runtime packages fell into this category. Is a change there a break?
If your debug logs change or their structure changes, is that a break? What if someone’s parsing them?
If an error message changes, is that a break? What if someone’s code depends on the exact wording?

None of these have easily-systematized answers. You can codify rules, but the edge cases require understanding how developers actually use your API — not just how you intended them to use it.

At Firebase, we had a web of dependencies that sometimes included internal interfaces between our libraries. By separating the internal interfaces from the external ones, we were able to version them separately and increase the set of compatible versions for our customers. The internal interfaces could evolve more freely because they weren’t part of the public contract — but only because we were explicit about the boundary. Meanwhile, our tools still understood what versions were compatible with each other via transitive dependencies.

There’s a tradeoff here: more flexibility in versioning can increase complexity and make it harder for developers to understand which versions work together. We took that part of the developer experience seriously. Just because we could make things more flexible didn’t mean we should make them arbitrarily complex. The tooling handled the compatibility resolution, but we still designed the boundaries to be comprehensible.

This is orderly API evolution in practice: designing your system so that the parts that need to be stable are stable, while the parts that need to evolve can do so without forcing unnecessary upgrades on developers — but always keeping the cognitive load on developers in mind.

Orderly API evolution is about sequencing changes to minimize disruption based on how developers are actually using your APIs, and also communicating and setting expectations. It’s about empathy for your developer customers. It’s about strategy, not just versioning numbers.

I’ve seen developers perform Olympic-level mental gymnastics to justify why something “technically isn’t a breaking change” — usually because they don’t want to bump the major version, or they’re trying to avoid the deprecation ceremony, or they’ve convinced themselves the change is small enough not to matter.

They’re not thinking about the developer on the other end who just deployed to production.

But I’ve also seen the opposite: teams so terrified of breaking changes that they accumulate cognitive and technical debt like barnacles on a ship. Every API decision becomes permanent. Innovation slows to a crawl. The platform calcifies around the needs of developers from three years ago, making it progressively worse for developers arriving today.

Both extremes miss the point: orderly API evolution is about preserving trust through change, not preventing change.

What Orderly API Evolution Actually Looks Like

At Parse and Firebase, we learned this the hard way. Here are practices that worked — none of which appear in any versioning spec:

1. Make the future genuinely better.
Provide carrots, not just sticks. New features only in the new API. Performance improvements. Better developer experience, greater ease of use, more flexibility. Make migration feel like an upgrade, not a chore you’re forcing on people.

2. Set expectations publicly, then be flexible privately.
In other words, your public policy is an SLA, but you may have a more ambitious SLO. Publish deprecation policies and treat them as minimums, not absolutes. The policy creates structure and sets a baseline commitment to developers. But internally, look at real usage patterns and be willing to extend timelines if too many developers still depend on something. We didn’t plough through breaking changes just because they’d been announced if doing so would hurt customers.

3. Coordinate breaking changes at portfolio scale.
It’s not always feasible, but think about how developers will experience it when they use multiple parts of your portfolio together. At Firebase, we had a suite of 15-20 products, each with their own SDKs. We set a standard: you may only break customers once a year, synchronized across all products. We used Google I/O as the venue for that — it became our primary breaking change release. Throughout the year, deprecations were allowed and new APIs were introduced beside old ones. Then at our annual breaking change release, we’d clean up after ourselves. This way customers saw one predictable breaking change per year instead of 15-20 breaks at various points.

4. Monitor deprecation adoption in real-time.
As the turndown approaches, consider the business impact for the group still using the API. Breaking developers who haven’t had time or resources to migrate destroys trust faster than you can rebuild it.

5. Sequence backend and client deprecations carefully.
Backend API versioning ≠ client API versioning (because clients compile into apps). When you deprecate a backend endpoint, deprecate it in client SDKs simultaneously. That way, the day a developer picks up your latest library, anything non-deprecated will work for at least the stated deprecation window.

6. Preview the pain before it’s real.
Issue temporary blackouts. Give developers a short window where the old API returns errors, then turn it back on. Surprisingly effective at accelerating migrations without actually breaking anyone.

The Firebase API Council also spent a lot of time ensuring that capricious deprecations didn’t happen. If something was going to break, there had to be a good reason, and we scrutinized every such change — often helping teams find ways around breaking things for customers, and failing that, helping them think through how to manage the transitions.

None of this is automatable. It’s all judgment calls informed by data, empathy, and a clear vision of where you’re taking the platform.

MCP: When Orderly Evolution Gets Fuzzy

MCP is surfacing new dimensions to these challenges. MCP may not be the end state for AI tool integration, but it’s an interesting exemplar of how these questions are changing. What does “breaking” even mean when an AI agent sits between the API and the human?

Is it about the tool signatures? If I rename a parameter but the agent can still accomplish the same task, did I break anything?

Is it about the fuzzy capabilities enabled by the server? If the end result is the same but the path through the tools changed, is that a break?

Is it about the mental model? If a developer was directly calling MCP tools (overloading the interface, as some frameworks allow), changing tool names is definitely a break. But if they’re only exposed through an agent, maybe it’s not?

What these parties care about differs:

End users (via chatbot) care about user experience
Developers with coding agents care about reliable completion of specific classes of tasks around coding
Developers building agents care about the reliability of their agents consistently calling these tools — but they’re also more exposed to the side effects of the tools
Developers using MCP as an API care about tool signatures and strict interfaces

When you version an MCP server, which audience are you versioning for? What does “orderly” look like when different stakeholders have different definitions of disruption?

And here’s the kicker: coexistence of old and new tool versions can confuse an agent. If I mark a tool as deprecated but leave it available during migration, does the agent keep using it? Do I need to flag deprecation in a way the agent understands? Do we need multiple versioning schemes — one for capability evolution, one for interface stability?

I don’t know. I’m figuring this out like everyone else.

But MCP is exposing something deeper: our versioning assumptions were built for deterministic, human-readable interfaces. When you add probabilistic agents that interpret fuzzy capabilities, the principles of orderly API evolution need to stretch in new directions.

What If We Built Tools for Orderly Evolution?

The new world of tools may also change what parts of migration and evolution are painful or onerous. Expectations about what developers can reasonably do may shift, and the set of tools we give to developers to help them navigate API evolution may expand.

We’ve been treating API migration as something developers do manually — reading deprecation notices, updating code, testing, deploying. Orderly API evolution has meant giving developers time and warnings to do that work.

But if AI agents are increasingly sitting between APIs and humans, what if they could handle more of the migration work?

Imagine:

An MCP server that announces deprecation with migration instructions the agent can execute. Not just “this tool is deprecated,” but “here’s the equivalent capability in the new version, here’s how to map the old parameters to new ones.”
Agents that detect when they’re using deprecated tools and automatically try the new version, falling back only if it fails. Make migration invisible to the end user.
Development tools that watch for breaking changes in your dependencies and propose the fixes. Not just “this broke,” but “here’s the diff to make it work again” — with AI doing the tedious translation work.
Platform APIs that expose their evolution roadmap in machine-readable format so agents can plan ahead, not just react to breaks.

This isn’t science fiction. The primitives are here:

Agents can already reason about code and make changes
Platforms already track usage patterns and deprecation timelines
The gap is giving agents the context they need to migrate intelligently

Maybe the future of orderly API evolution includes building the infrastructure for agents to be good migration partners.

That changes the calculus. If some migrations can be automatic (or near-automatic), it shifts how we think about the costs and benefits of breaking changes. The developers who haven’t arrived yet might get better APIs sooner. The developers already here might not pay as much of the migration tax. But breaking changes remain disruptive — we’d just be moving where that disruption lands.

I’m not saying this solves everything — there will always be migrations that require human judgment, especially when domain logic changes. But what percentage of breaking changes are straightforward enough that an agent could handle? 30%? 50%? More?

The Uncomfortable Truth About Evolving Platforms

Here’s what I keep coming back to, though, whether we’re talking about MCP or REST APIs or SDKs:

Unless you’re on an extremely mature platform or one that expects no growth, the developers who will build for your platform far outnumber the developers who are already building for your platform.

Most of the developers who will ever use your platform haven’t even considered it yet. They’ll arrive six months from now, a year from now, five years from now. And when they do, they’ll judge you based on the API as it exists then, not the API you shipped two years ago.

This isn’t permission to break things willy-nilly. It’s about recognizing that being permanently stuck with old decisions in the name of avoiding breakages is itself a choice — and often the wrong one.

If you never make breaking changes, you’re optimizing the experience for the developers who are already here at the expense of everyone who’s coming. You’re choosing current convenience over future clarity. You’re accumulating debt that every new developer will have to pay interest on.

But if you break things recklessly, you teach the developers who are here that you can’t be trusted. They’ll leave. And they’ll tell others not to come.

Orderly API evolution means honoring the investment developers have already made while still nurturing a better tomorrow for the developers who haven’t arrived yet.

There’s no formula for this. It requires judgment. It requires looking at usage data and knowing what you’re building toward. It requires the confidence to say “this will be temporarily painful but ultimately better” — and the humility to delay when you’re wrong.

Open Questions

Here are some questions I’m sitting with:

On versioning in an agent-mediated world:

How do you version capabilities vs. interfaces? When the value is in what can be accomplished rather than how it’s accomplished, does SemVer even apply?
What’s the right deprecation UX for AI agents? Should agents be able to detect and warn about deprecated tools? Should we hide deprecated tools entirely to avoid confusion?
How do you balance multiple audiences? If direct API consumers need strict stability but agent users want flexible capabilities, do you need parallel versioning schemes?

On building migration infrastructure:

What metadata do platforms need to expose for agents to migrate automatically?
Should deprecation warnings include executable migration instructions?
Can we create standards for machine-readable API evolution roadmaps?
What’s the right division of labor between agent automation and human judgment in migrations?

On the fundamentals:

What’s the minimum viable breaking change? Not “can we avoid breaking” but “what’s the smallest break that gets us to where we need to be?”
How do you measure trust impact? Usage draw-down is one signal, but it’s lagging. What are leading indicators that developers are losing faith in your evolution strategy?

These questions are worth sitting with. The answers aren’t obvious, and the landscape keeps shifting. What orderly API evolution looks like in an agent-mediated world is still being discovered.

What Orderly API Evolution Means

If there’s one thing I hope you take away from this, it’s that orderly API evolution is fundamentally about trust through change, not the prevention of change.

The practices I’ve outlined — from coordinating breaking changes at portfolio scale to building infrastructure for agent-assisted migrations — all serve that goal. But they’re not a checklist. They require judgment informed by data, empathy for your developer customers, and a clear vision of where your platform is heading.

The agentic coding landscape is forcing us to expand how we think about these problems. What constitutes a “break” when an AI agent mediates between your API and the humans using it? How do we version capabilities versus interfaces? What migration work can we reasonably expect agents to handle?

We don’t have all the answers yet. And that’s okay. What matters is that we’re asking the right questions and building with the understanding that APIs exist to serve developers — both the ones already here and the ones who haven’t discovered us yet.

The term “orderly API evolution” is my attempt to give us better vocabulary for these conversations. If it helps you have more thoughtful discussions about how and when to break things in your own work, then it’s done its job.