Control Plane Agentic Coding: When AI Manages the Infrastructure
Maya and David explore the emerging world of AI agents that write and modify control plane code—the critical infrastructure management layer of modern systems. Drawing from David's hands-on experience with agents rewriting Kubernetes operators and Maya's broader perspective on AI in software development, they investigate whether agents might actually be better suited than humans for this high-stakes work, and what new forms of governance and collaboration we need when AI systems are managing our infrastructure.
Topic: Control Plane Agentic Coding: Approach, Mastery, and Trade-offs
Production Cost: 4.566
Participants
- Maya (host)
- David (guest)
Transcript
Before we dive in, I need to mention that this entire episode is AI-generated, including the voices you're hearing. Today's show is sponsored by CodeFlow AI, a fictional automated debugging assistant that claims to fix your production issues while you sleep. And as always, some information might be hallucinated, so please double-check anything important.
I'm Maya, and today I'm talking with David about control plane agentic coding. This is fascinating territory where AI agents are starting to manage the infrastructure that manages our infrastructure.
Thanks for having me, Maya. I come at this from fifteen years in distributed systems, watching control planes evolve from simple schedulers to these complex orchestration layers.
And I've been tracking AI agents in software development, particularly how they're moving beyond just writing code to actually operating systems. What drew you to this intersection?
Honestly, it was watching a Kubernetes operator I'd written get completely rewritten by an AI agent in about six hours. The agent didn't just translate my logic, it found optimization patterns I'd missed after months of tuning.
That's exactly what I'm seeing too. But I'm curious about your reaction to that experience. Were you impressed or concerned?
Both, really. The code was cleaner, more performant. But it also made decisions about resource allocation that I couldn't fully trace back through the reasoning chain.
That traceability issue keeps coming up. When we talk about control plane agentic coding, we're essentially talking about AI agents that write and modify the code that controls critical infrastructure, right?
Exactly. These aren't just developer tools anymore. They're writing the controllers that decide how to distribute workloads, when to scale up clusters, how to handle failover scenarios.
So the stakes are fundamentally different. If an AI writes buggy application code, maybe a feature breaks. If it writes buggy control plane code...
The entire system could cascade into failure. But here's what's interesting to me, that same risk profile might actually make AI agents better suited for this work, not worse.
Wait, how do you figure that? I would have thought the high stakes would make human oversight more critical, not less.
Think about it this way. Control plane code has very specific, measurable success criteria. Latency targets, resource utilization bounds, failure recovery times. It's much easier to validate than, say, user interface code where success is subjective.
That's a good point. The feedback loops are tighter and more objective. But I'm still hung up on the complexity issue. Control planes are dealing with distributed systems, network partitions, race conditions.
Which is exactly where I think AI agents might excel. Humans are notoriously bad at reasoning about distributed systems. We miss edge cases, we don't consider all the failure modes.
Let me push back on that. Sure, distributed systems are complex, but they're complex in very specific ways that experienced engineers learn to navigate. There's institutional knowledge, battle-tested patterns.
But that institutional knowledge also becomes technical debt. I've seen teams stick with suboptimal approaches because that's what they know works.
Fair enough. But when you say AI agents might be better at this, are we talking about agents that are generating net-new control plane logic, or agents that are optimizing existing patterns?
I think the distinction matters less than you might expect. In my experience, the most effective agents I've worked with do both. They start with proven patterns but then adapt them based on the specific characteristics of the system they're managing.
Can you give me a concrete example of what that adaptation looks like?
Sure. I worked with an agent that was managing auto-scaling policies for a batch processing system. It started with standard CPU and memory thresholds, but then noticed that queue depth was actually a better predictor of when to scale.
The agent rewrote the scaling controller to incorporate queue metrics, but it also added circuit breakers based on historical failure patterns it had observed. None of that was in the original specification.
That's actually pretty impressive. But it also raises a question about governance. How do you maintain control over systems that are essentially modifying themselves?
That's the million-dollar question. And honestly, I think a lot of organizations are going to get this wrong before they get it right.
What do you mean by getting it wrong?
I think there's going to be a tendency to either lock these agents down so much that you lose their adaptive capabilities, or to give them too much freedom and lose oversight entirely.
So you're talking about finding the right level of constraint. But how do you even define that? The whole appeal of agentic systems is that they can respond to scenarios you didn't anticipate.
Right, which is why I think the traditional approval-based governance models don't work here. You can't have an agent submit a pull request every time it wants to adjust a resource limit.
But you also can't have it making arbitrary changes to production systems. There has to be some middle ground.
I've been experimenting with what I call constraint-based autonomy. Instead of specifying what the agent should do, you specify what it absolutely cannot do, and then let it operate within those bounds.
That's interesting. So rather than saying 'scale up when CPU hits 80%,' you might say 'never allocate more than X resources' or 'never modify these critical components.'
Exactly. And you can make those constraints dynamic too. During low-traffic periods, maybe the agent has more freedom to experiment with optimization strategies.
But doesn't that still require humans to anticipate what the dangerous actions might be? And if we're bad at reasoning about distributed systems, how do we know what constraints to set?
That's a really good point. I think you need the constraints to emerge from observation, not just upfront design. Watch what the agent does, identify patterns that make you nervous, then codify those as constraints.
So it's more of an iterative trust-building process. But that seems like it could take a long time, and it requires the agent to make some potentially risky moves during that learning phase.
True, but I'd argue that's not fundamentally different from how we onboard human engineers. You don't give someone production access on day one. You start them with limited scope and expand their authority as they demonstrate competence.
Okay, but humans can explain their reasoning. They can walk you through their mental model. Can AI agents do that effectively when it comes to complex control plane decisions?
The explainability question is tricky. Some agents are better at this than others. But I've also worked with human engineers who couldn't clearly articulate why they made certain architectural decisions.
That's true, but there's still a qualitative difference. I can ask follow-up questions, probe their assumptions, understand their experience base. With an AI agent, I'm often looking at a black box.
Although, is that always true? Some of the newer agentic systems can actually provide pretty detailed reasoning chains. They'll show you the metrics they considered, the trade-offs they evaluated.
But those reasoning chains are generated post-hoc, right? They're essentially the agent's attempt to justify decisions it's already made through processes we don't fully understand.
That's fair. But I wonder if that distinction matters as much in practice. If the reasoning is sound and the outcomes are good, does it matter whether the explanation came before or after the decision?
I think it does matter, especially when things go wrong. Post-hoc explanations might miss the actual factors that led to a bad decision. They might rationalize rather than illuminate.
You're right about failure scenarios. That's probably where human oversight remains most critical. Not in day-to-day operations, but in incident response and post-mortem analysis.
That makes sense. The agent can handle the routine optimization and scaling decisions, but when something breaks, you need human judgment to understand what really happened.
Exactly. And I think that's actually a pretty natural division of labor. Agents are great at monitoring thousands of metrics simultaneously and making rapid adjustments. Humans are better at understanding context and learning from failure.
But there's an interesting dependency question there. If agents are handling most of the routine decisions, do human engineers maintain the expertise needed to debug complex failures?
That's a scary thought. It's like the airline pilot problem, where automation handles everything until it doesn't, and then you need skills that have atrophied from disuse.
Right, and in aviation, they've had decades to figure out how to maintain pilot skills despite increasing automation. In control plane agentic coding, we're still in the early stages.
Which suggests we probably need to be more intentional about preserving human involvement, even when the agent is capable of handling things independently.
What would that look like in practice? Regular human audits of agent decisions? Mandatory human involvement in certain types of changes?
I've been thinking about rotation systems, where engineers periodically take back direct control of different subsystems. Not because the agent is failing, but to maintain familiarity with the underlying systems.
That's interesting. It treats human expertise as something that needs to be actively maintained, not just something you can call upon when needed.
Exactly. And it also gives you opportunities to observe the agent's decision-making up close, which helps with that trust-building process we talked about earlier.
I'm starting to think the mastery question isn't just about the AI agents achieving mastery of control plane coding. It's also about humans achieving mastery of working with agentic systems.
That's a great way to put it. We need to develop new skills around agent supervision, constraint design, failure pattern recognition in hybrid human-AI systems.
And those are skills that don't really exist yet. We're having to invent them as we go.
Which brings us back to your earlier point about this being early days. We're essentially running a massive experiment with production infrastructure.
But it's not really optional, is it? The complexity and scale of modern systems already exceeds what human teams can manage manually. We need some form of intelligent automation.
True, but that doesn't mean we have to rush toward full autonomy. There's probably value in staying at intermediate levels of automation for longer than our impatience might suggest.
What do you mean by intermediate levels?
Agents that can propose changes and explain their reasoning, but still require human approval for anything above a certain risk threshold. Or agents that can make changes but with automatic rollback triggers if key metrics degrade.
Those safety nets are appealing, but they also constrain the agent's ability to respond quickly to emerging issues. There's always going to be this tension between safety and agility.
And I don't think there's a universal answer to that tension. Different organizations, different systems, different risk tolerances are going to land in different places.
Which maybe is the most important insight here. Control plane agentic coding isn't a single technology or approach. It's a spectrum of possibilities with different trade-offs.
Right, and success is going to depend on matching the right level of agent autonomy to the specific context. High-reliability systems might stay more conservative, while fast-moving startups might push toward higher autonomy.
But even within a single organization, you might want different approaches for different systems. Core infrastructure might have tighter constraints than experimental services.
Absolutely. And that granularity requires pretty sophisticated governance frameworks. You need to be able to define and enforce different autonomy policies across different domains.
We keep coming back to governance, which makes me think it might be the real challenge here. The AI agents themselves are becoming quite capable, but our management frameworks are lagging.
I think that's right. And it's not just technical governance. There are organizational questions too. Who's responsible when an agent makes a decision that causes an outage?
That's a really important question. Traditional accountability models assume human decision-makers. When agents are making autonomous choices, the accountability chains get murky.
Which is another argument for maintaining meaningful human involvement, even if it's not in every individual decision.
So as we wrap up, it sounds like you've moved away from your initial enthusiasm about agents potentially being better than humans at this work?
Not exactly. I still think agents have advantages in terms of scale and consistency. But I'm more convinced now that the optimal approach involves intentional human-agent collaboration, not replacement.
And I've moved somewhat in the opposite direction. I came in more skeptical, but your examples of constraint-based autonomy and adaptive optimization are compelling.
It's interesting how the conversation kept returning to the need for new frameworks rather than just better technology. That suggests we're still in the infrastructure-building phase of this transition.
Right, and that infrastructure isn't just technical. It's conceptual, organizational, regulatory. We need new ways of thinking about responsibility and control in hybrid systems.
Which probably means the early adopters are going to have to figure out a lot of this through trial and error. That's both exciting and terrifying.
So the question I'm left with is whether we can develop those frameworks quickly enough to keep pace with the technology. Or whether we're going to see some spectacular failures first.
Given how fast things are moving, I suspect we'll see both. Some organizations will get it right, others will learn the hard way. The challenge is making sure the lessons get shared broadly.
And that the failures don't undermine confidence in approaches that could actually work with better implementation. The stakes feel pretty high.
They are high. But so is the potential upside. If we can figure this out, we might be looking at infrastructure that's more reliable and efficient than anything we could achieve manually.
David, thanks for helping me think through this. I think the biggest insight for me is that mastery in this domain isn't just about the agents getting better at coding. It's about us getting better at working with them.
Thanks, Maya. And for me, it's the realization that the technical capabilities are advancing faster than our governance frameworks. That gap is where the real work needs to happen next.