From Vibe Coding to Prompt Contracts: Engineering Discipline for AI Development

Before we dive in, I want to let you know that this entire episode is AI-generated, including the voices you're hearing. Today's episode is brought to you by CodeFlow Pro, a fictional productivity app that turns your messy Git commits into haikus, though I should mention this sponsor is completely made up. Please double-check any important information from today's discussion, as some details may be hallucinated.

I'm Sarah, and today we're talking about a book that promises to transform how developers work with AI. It's called 'Prompt Contracts: How I Stopped Vibe Coding and Started Shipping Real Software With AI' by Philippe Eveilleau.

Philippe, welcome to the show. Let's start with the basics - what is vibe coding, and why did you feel compelled to write an entire book about stopping it?

Thanks for having me, Sarah. Vibe coding is what I call the way most developers are using AI right now. They throw a rough prompt at ChatGPT or Copilot, see what comes back, and then just keep iterating based on feel.

It's like coding by vibes - you're not being systematic about what you want, how you'll measure success, or how you'll handle edge cases. You're just hoping the AI gets it right eventually.

And this approach is problematic because?

Because it doesn't scale, and it doesn't produce reliable results. I spent months watching talented developers waste hours going back and forth with AI tools, getting frustrated, and shipping buggy code.

I realized we needed to treat AI interactions more like we treat any other critical system in software development. We need contracts, specifications, and clear expectations.

Tell me about your background. What qualifies you to solve this problem?

I've been a software engineer for over fifteen years, working everywhere from startups to big tech companies. When AI coding tools started becoming mainstream, I was an early adopter at my company.

But I quickly noticed that while everyone was excited about AI's potential, most teams were using it ineffectively. I started developing systematic approaches within my team, and our productivity and code quality improved dramatically.

What made you realize this was a broader problem worth writing about?

I started sharing our methods with other engineering teams, and I kept hearing the same stories. Developers would get amazing results from AI sometimes, then spend hours trying to recreate that success.

The inconsistency was killing productivity. I realized this wasn't just a tool problem - it was a methodology problem. We needed a systematic way to work with AI that any developer could learn and apply.

So let's dig into your central thesis. What exactly is a prompt contract, and why is it the solution to vibe coding?

A prompt contract is a structured agreement between you and the AI about what you want it to do. It includes the specific task, the expected inputs and outputs, success criteria, and how to handle edge cases.

Think of it like writing a function signature, but for AI interactions. You wouldn't write a function without knowing its parameters and return type, so why would you prompt an AI without the same clarity?

That's an interesting analogy. Can you give me a concrete example of what a prompt contract looks like versus a typical vibe coding approach?

Sure. A vibe coding prompt might be 'Help me write a function to validate email addresses.' That's it. No context, no specifications, no success criteria.

A prompt contract version would specify the programming language, the exact validation rules you want, the expected input format, how to handle edge cases like international domains, and what the return value should look like.

More importantly, it would include test cases that demonstrate exactly what success looks like. The AI knows precisely what you want, and you can verify whether it delivered.

What's the intellectual foundation for this approach? Is this drawing from existing software engineering practices?

Absolutely. This builds directly on contract-driven development and design by contract principles that Bertrand Meyer popularized. The idea that you specify preconditions, postconditions, and invariants for your code.

I'm also drawing heavily from test-driven development. When you write tests first, you're forced to think clearly about what you want your code to do. Prompt contracts apply the same discipline to AI interactions.

What makes this different from just writing better prompts? Isn't that what prompt engineering is already about?

Prompt engineering typically focuses on getting better outputs from a single interaction. It's about crafting the perfect prompt. But software development isn't a single interaction - it's an iterative process with multiple stakeholders and changing requirements.

Prompt contracts are about creating a repeatable, maintainable process. When requirements change, you update the contract and regenerate the code. When a new team member needs to understand what the AI was supposed to do, they read the contract.

You mention that this responds to a specific moment in the evolution of AI tools. What was the catalyst that made this approach necessary now?

The catalyst was AI tools becoming good enough to generate substantial amounts of production code, but not good enough to read our minds. We hit this uncanny valley where the output looks impressive, but subtle bugs and misunderstandings are everywhere.

When AI could only write toy examples, it didn't matter if you were sloppy with your prompts. But now that we're using it for real applications with real users, we need real engineering discipline.

Now let's get into the practical frameworks you outline in the book. What's the basic structure of a prompt contract?

Every prompt contract has five core components. First is the context section, which explains the broader system and why this code exists. Second is the specification, which defines exactly what the code should do.

Third is the interface definition - inputs, outputs, and any side effects. Fourth is the test cases that demonstrate correct behavior. And fifth is the constraints - performance requirements, coding standards, libraries to use or avoid.

Let's walk through a real example. Say I need to build a function that processes user payments. How would I structure that as a prompt contract?

Great example. The context section would explain that this is part of an e-commerce checkout flow, handling the final payment processing step after validation has already happened upstream.

The specification would be very precise: 'Process a payment using Stripe's API, update the order status in our database, send a confirmation email, and handle common failure scenarios like declined cards or network timeouts.'

What would the interface definition look like for this payment processing function?

The interface would specify that it takes a payment object with amount, currency, customer ID, and payment method token. It returns a result object with success status, transaction ID if successful, or error details if it failed.

Crucially, it would also specify side effects - the database records that get updated, the email that gets sent, and any logging that needs to happen for audit purposes.

And how do you handle the test cases component for something complex like payment processing?

The test cases section would include happy path scenarios, like successful payment with a valid card, but also edge cases. What happens with insufficient funds? Network timeouts? Invalid payment tokens?

Each test case specifies the exact input, the expected output, and the expected side effects. For a declined card, you'd expect a specific error structure, no database changes, and no confirmation email.

What about the constraints section? How detailed do you get there?

For payment processing, constraints might include using our existing database connection pool, following PCI compliance patterns already established in the codebase, and completing within 30 seconds to avoid user timeout.

You might also specify error handling patterns, like using our standard logging format and returning errors that match our API's existing error schema. The AI needs to know how this code fits into your larger system.

This sounds quite detailed. How long does it take to write a prompt contract compared to just jumping into coding?

Initially, it takes longer - maybe 15 to 30 minutes for a complex function versus 5 minutes for a quick prompt. But here's the key insight: that upfront time investment pays massive dividends.

With a good prompt contract, the AI usually gets it right on the first or second try. Without one, I've seen developers spend hours iterating, debugging, and fixing subtle issues they didn't anticipate.

Let's talk about another framework you present - the concept of contract layers. Can you explain how this works?

Contract layers recognize that complex software has multiple levels of abstraction. You might have a high-level contract for an entire feature, medium-level contracts for individual components, and low-level contracts for utility functions.

The key is that each layer references and builds on the layers below it. Your component-level contracts inherit constraints and context from the feature-level contract above them.

Can you give me a concrete example of how contract layers would work in practice?

Sure. Let's say you're building a user dashboard. The top-level contract defines the overall user experience - what data gets displayed, how it's organized, performance requirements.

The component-level contracts handle specific widgets - the user profile card, the activity feed, the settings panel. Each inherits styling standards and data access patterns from the top level.

The function-level contracts handle specific operations - formatting dates, calculating activity scores, or validating user input. Each inherits error handling and logging requirements from the component level.

How does this layered approach change how you work with the AI? Do you generate everything top-down?

Usually, yes. You start with the high-level contract and have the AI generate the overall architecture and component interfaces. Then you create detailed contracts for each component and generate the actual implementations.

This prevents the common problem where AI-generated components don't integrate well together. When everything derives from a common contract hierarchy, the pieces fit together naturally.

You also introduce something called 'contract evolution.' What happens when requirements change after you've already generated code?

This is where prompt contracts really shine compared to traditional AI interactions. Instead of starting from scratch or trying to modify existing code through additional prompts, you update the contract and regenerate.

For example, if our payment processing function now needs to handle multiple currencies, you update the interface definition and test cases in the contract, then ask the AI to regenerate the implementation.

That sounds clean in theory, but what about preserving customizations or bug fixes you've made to the generated code?

Great question. The book covers several strategies for this. The cleanest approach is to extract customizations into separate, well-defined extension points that the contract can specify.

If you've fixed a bug, ideally you add a test case to the contract that captures that bug, so the regenerated code won't have the same issue. If you've added a feature, you update the contract to include that feature.

What about the debugging framework you present? How do prompt contracts change how you debug AI-generated code?

When code generated from a vague prompt has bugs, you're basically starting detective work from scratch. With prompt contracts, you have a clear specification to check the code against.

If a test case is failing, you can immediately see whether the AI misunderstood the requirement or whether there's an edge case you didn't anticipate in the contract. That tells you whether to fix the code or update the contract.

Can you walk me through a specific debugging scenario using this approach?

Let's say our payment processing function is failing for European customers. With a prompt contract, you first check whether the contract specified international payment handling. If not, that's a contract problem.

If the contract did specify it, you look at whether the test cases covered European payment scenarios. If the test cases are missing, you add them and regenerate. If they're present but failing, you know the AI implementation has a bug.

This brings up an interesting point about test coverage. How do you know if your prompt contract has adequate test cases?

I use the same principles as traditional test-driven development. You want happy path cases, edge cases, error conditions, and boundary conditions. The difference is you're writing these before any code exists.

A good rule of thumb is that if you can't think of test cases that would distinguish between a correct implementation and an incorrect one, your specification isn't detailed enough yet.

Now let's talk implementation. I'm a developer who's convinced by this approach. Where do I start?

Start small with a single, well-defined function that you need to write anyway. Don't try to apply this to your entire codebase on day one. Pick something with clear inputs, outputs, and success criteria.

Write your first prompt contract following the five-component structure we discussed earlier. Spend extra time on the test cases - that's usually where people skimp early on, and it's the most important part.

What's a good example of that first, small function to try this approach on?

Data validation functions are perfect. Let's say you need to validate phone numbers. The context is clear, the specification is bounded, the interface is simple - string in, boolean or error out.

You can easily write comprehensive test cases covering US numbers, international numbers, invalid formats, and edge cases. And you'll immediately see whether the AI understood your requirements.

Once someone has success with a simple function, how do they scale up to more complex scenarios?

The next step is to try a function with multiple responsibilities - something that needs to call other services, update databases, and handle various error conditions. This is where the contract structure really proves its value.

After that, you can experiment with contract layers. Take a feature that needs multiple related functions and try creating a hierarchy of contracts that build on each other.

What about team adoption? How do you get a whole engineering team to start using this approach?

Start with yourself and maybe one other interested team member. Build up some success stories and concrete examples. Nothing convinces engineers like seeing dramatic productivity improvements.

Then introduce it gradually. Maybe new features use prompt contracts, while legacy code continues with existing approaches. You don't need to revolutionize everything overnight.

What are the most common mistakes people make when they're starting with prompt contracts?

The biggest mistake is writing contracts that are too vague. People think they're being systematic, but their specifications are still ambiguous and their test cases don't cover edge cases.

The second most common mistake is trying to make the contracts too comprehensive initially. You end up spending hours on the contract and never getting to the actual coding. Start simple and iterate.

How do you know when a prompt contract is good enough to use?

A simple test: could another developer read your contract and implement the function correctly without talking to you? If there's ambiguity that would require clarification, the contract isn't ready yet.

Also, run through your test cases mentally. If you can imagine multiple different implementations that would all pass your tests but behave differently, you need more specific test cases.

What about integration with existing development workflows? How does this fit with code reviews, CI/CD, version control?

The contracts themselves should be version controlled alongside your code. When you're reviewing a pull request with AI-generated code, you review the contract first to make sure the requirements make sense.

Then you verify that the generated code actually implements the contract correctly. This makes code reviews much more focused and effective than trying to reverse-engineer what AI-generated code is supposed to do.

How long does it typically take to see productivity improvements from this approach?

Most people see immediate improvements in code quality - fewer bugs, better error handling, more consistent patterns. The productivity gains take a few weeks as you get faster at writing contracts.

The really big wins come after a month or two, when you start reusing and adapting contracts for similar problems. That's when the systematic approach starts paying compound returns.

Are there scenarios where this approach doesn't work well? When should someone not use prompt contracts?

Prompt contracts are overkill for throwaway scripts or quick prototypes where you just need something working fast and don't care about maintainability. Sometimes vibe coding is the right tool for the job.

They're also less useful for highly creative or exploratory coding where you're not sure what you want yet. If you're still figuring out the requirements, start with vibe coding and formalize into contracts once things stabilize.

What about performance? Do you find that AI-generated code from detailed contracts performs better or worse than more casually prompted code?

Generally better, because the contracts force you to think about performance requirements upfront. When you specify that a function needs to handle 1000 requests per second or process files larger than 1GB, the AI generates appropriate code.

Without those constraints, AI tends to generate code that works for simple cases but doesn't scale. The systematic approach catches performance issues before they become problems.

If someone can only implement one thing from your book, what should it be?

Write test cases before you prompt the AI. Even if you skip everything else about prompt contracts, this one change will dramatically improve your results.

Why are test cases so crucial?

Because they force you to think concretely about what success looks like. You can't write a good test case without understanding the edge cases, error conditions, and exact behavior you want.

Plus, test cases give you immediate feedback on whether the AI understood your requirements. If the generated code passes all your tests, you're probably in good shape. If not, you know exactly what to fix.

Let's talk about the book critically. What does it do really well?

The book's biggest strength is that it's grounded in real experience. Every technique comes from actual problems I faced and solved. It's not theoretical - it's practical guidance that works in messy, real-world codebases.

It also provides a complete methodology, not just tips and tricks. You can read it and immediately start applying a systematic approach to AI-assisted development.

Where does the book fall short or overpromise?

The book probably underestimates how much cultural change is required to adopt this approach in established teams. I make it sound easier than it actually is to get everyone on board with a more systematic process.

I also don't spend enough time on the tools and infrastructure that would make prompt contracts easier to manage at scale. The approach works, but it could be more streamlined with better tooling.

How does this compare to other books about AI-assisted development?

Most books in this space focus on prompt engineering techniques or general AI capabilities. This book is more narrowly focused on software engineering discipline and systematic processes.

It's less about getting better outputs from AI and more about integrating AI into professional software development workflows. That makes it more practical but potentially less broadly applicable.

What important topics does the book not cover that readers should look for elsewhere?

The book doesn't cover AI model selection, fine-tuning, or the technical details of how these systems work. If you want to understand the underlying technology, you'll need other resources.

It also doesn't address the broader organizational and ethical implications of AI in software development. It's focused purely on individual and team productivity, not industry-wide changes.

Are there complementary resources you'd recommend alongside this book?

I'd recommend pairing this with books on test-driven development and design by contract to understand the underlying principles more deeply. The AI-specific techniques build on those foundations.

For teams, books on engineering culture and systematic processes are valuable. The technical approach only works if your team culture supports systematic thinking and continuous improvement.

How has the book been received since publication? What feedback have you gotten?

The response has been polarized, which I expected. Experienced developers who are frustrated with current AI workflows love it. They've been waiting for someone to bring engineering discipline to this area.

Some developers think it's overkill and prefer the flexibility of informal prompting. I understand that perspective, but I think it misses the long-term maintainability benefits.

Have you seen teams or companies adopt these practices systematically?

Yes, particularly in companies that already have strong engineering cultures around testing and documentation. They see prompt contracts as a natural extension of existing practices.

The most successful adoptions have been gradual - teams start with critical components and expand over time rather than trying to change everything at once.

How has the AI development landscape changed since you wrote the book? Does anything need updating?

AI models have gotten significantly better at understanding complex requirements, which makes prompt contracts even more powerful. The systematic approach scales well with improved AI capabilities.

If anything, I'd emphasize the contract evolution aspects more. As AI gets better at generating larger code blocks, being able to iterate systematically becomes even more important.

Looking forward, how do you see this approach evolving as AI tools continue to improve?

I think we'll see better tooling that makes contract creation and management easier. Right now, it's mostly manual processes, but there's room for automation and integration with development environments.

As AI models get better at maintaining context across longer interactions, we might see more sophisticated contract hierarchies and cross-project contract reuse.

As we wrap up, what's the single most important mindset shift you want listeners to take from this conversation?

Stop treating AI as magic and start treating it as a powerful but imprecise tool that requires clear specifications. The same engineering discipline that makes human collaboration successful makes AI collaboration successful.

And if someone walks away from this episode and does just one thing differently, what should it be?

Before your next AI interaction, spend five minutes writing down exactly what success looks like. What should the code do? What should it not do? What are the edge cases? That clarity will transform your results.

Philippe, this has been incredibly practical. Thanks for joining us and for writing a book that brings real engineering discipline to AI-assisted development.

Thanks for having me, Sarah. I hope listeners can use these ideas to ship better software faster.

From Vibe Coding to Prompt Contracts: Engineering Discipline for AI Development

Participants

Transcript