
The 3-Month Wall
If your vibe-coded app collapsed after 3 months, if you are dealing with drift where the AI overwrites existing logic, if you want to protect API contracts from code changes – Hurl and the ratchet are the solution.
You build a SaaS with vibe coding. At first it is fast. “Make login” – 30 seconds. “Add payments” – 2 minutes. An MVP ships in 3 weeks.
Three months later, strange things happen. The AI “tidies up” the payment logic and silently changes the discount calculation. Adding a new endpoint breaks existing authentication. You ask for a refactoring and the field names on the public API change, killing every client.
This is called logic drift – the AI unintentionally modifying existing business logic. Regression bugs exist in traditional development too. But logic drift is different. Changes the developer did not intend happen without the developer noticing, across the entire codebase. Because every prompt starts in a fresh context window.
Drift in Numbers
This is not sentiment. There is data.
The price of speed is complexity. A Carnegie Mellon team compared 807 GitHub repositories before and after Cursor adoption (MSR 2026). In the first month, code additions increased 3-5x. Two months later the speed advantage vanished. What remained was a 30% increase in static analysis warnings and a permanent 41% increase in code complexity. This finding was independently replicated. Liu et al. (2026) analyzed 302,600 AI-authored commits across 6,299 GitHub repositories and found that unresolved technical debt surged from a few hundred in early 2025 to over 110,000 by February 2026.
It didn’t get faster – it got slower. The nonprofit AI research organization METR ran a randomized controlled trial with 16 experienced open-source developers (2025). On projects they knew well, the group using AI tools took 19% longer to complete tasks. Yet the developers themselves perceived a 20% speed improvement. The gap between perception and reality was 39pp. Results may differ for new projects, but the assumption “AI = always faster” breaks.
Stability collapses at scale. According to Google’s DORA Report (2025), for every 25% increase in AI adoption, software delivery stability decreases by 7.2%.
It actually collapsed. Amazon mandated AI coding tools company-wide in 2025 and deployed 21,000 AI agents. During the same period, approximately 30,000 employees were laid off, drastically reducing review capacity. The combination of rapid AI code generation and reduced review capacity resulted in 4 Sev-1 incidents over 90 days. On March 5, 2026, a 6-hour outage caused an estimated loss of 6.3 million orders. An internal document stated: “GenAI’s rapid code generation is inadvertently exposing vulnerabilities, and current safety measures are wholly inadequate.”
“Do TDD” Is Not the Answer
The common advice for vibe coding drift is “write tests.” The direction is right, but how you provide tests determines the outcome.
The TDAD study (arxiv 2026) tested this precisely. They had Qwen3-Coder 30B solve 100 SWE-bench Verified instances.
| Condition | Regression rate |
|---|---|
| Baseline (no test instruction) | 6.08% |
| “Do TDD” procedural instruction | 9.94% (worse) |
| Providing affected test files as context | 1.82% (70% reduction) |
Telling it “do TDD” makes things worse. The agent strays from its main task trying to follow the procedural instruction. But providing the concrete context “these test files must pass” cuts regressions by 70%.
The difference is clear. Not “how to test” as an instruction, but “what must pass” as a contract.
Hurl: Contracts in Plain Text
The concept of “contracts” in software was formalized by Bertrand Meyer (1992) – the trinity of preconditions, postconditions, and invariants that specify obligations between modules. Hurl applies this principle to the HTTP boundary. It is a testing tool that declares HTTP requests and expected responses in plain text. Maintained by Orange (France Telecom), it is a Rust binary with zero runtime dependencies and 18.7k GitHub stars. Fast enough to run on every commit in CI.
# Login success
POST http://localhost:8080/api/auth/login
{
"email": "test@example.com",
"password": "secret123"
}
HTTP 200
[Asserts]
jsonpath "$.token" exists
jsonpath "$.user.email" == "test@example.com"
# Unauthenticated access returns 401
GET http://localhost:8080/api/pages
HTTP 401
Two contracts. Login must return 200 with a token, and unauthenticated access must return 401.
When this file is committed to git and executed on every commit in CI – the moment the AI “tidies up” the auth logic and 401 becomes 200, the commit is rejected. Drift is caught before it reaches production.
Why Hurl
Unit tests can also catch drift – if you don’t give the AI permission to modify the test files. But unit tests verify internal functions of the code, so they are structurally coupled to the implementation. When function names change, tests break too, and you must update tests with every refactoring.
Hurl sits at the HTTP boundary. It declares only requests and responses. It knows nothing about the code’s internal structure. No matter how the AI changes the code, if the externally observable behavior stays the same it passes; if it differs it fails. It is naturally independent of implementation, not behavior.
| Unit tests | Hurl | |
|---|---|---|
| Verification target | Function internals | HTTP contract |
| On AI refactoring | Changed together | Unchanged |
| Drift detection | Conditional (if modification blocked) | Natural |
| Code structure dependency | High | None |
| Human readability | Code level | Plain text |
| LLM generation ease | Requires code structure understanding | Only needs HTTP knowledge |
What Hurl verifies is not code but behavior. The AI can freely change the code. The behavior must not change. This distinction is the key to catching drift.
Ratchet Lock
When a Hurl test passes, it locks. This is the ratchet. A locked Hurl test is ratchet code — deterministic code that makes a passed API contract irreversible.
1. Write Hurl tests for the current API (or auto-extract)
2. Run on every commit in CI
3. Passed tests cannot be deleted or modified
4. When adding new features, add new Hurl tests
5. All existing tests + all new tests must pass to merge
When you tell the agent “refactor the code,” it freely changes the code. But if Hurl tests break, the commit is rejected. The agent must refactor while preserving all existing behavior. Drift is still possible in edge cases Hurl doesn’t cover, but for covered behavior, drift is structurally suppressed.
This aligns exactly with the TDAD study’s finding. Not the procedural instruction “write tests,” but the concrete contract “these Hurl files must pass.” The agent can choose methods, but it cannot violate the contract.
It Works on Legacy Too
Already running production with vibe coding? No need to start from scratch.
Step 1: Capture current behavior in Hurl.
If you have API documentation, transcribe it to Hurl. If not, have the agent read the existing code and write Hurl tests. The goal is to declare “this is how every endpoint currently works” in plain text.
Step 2: Hook it into CI.
Confirm all Hurl tests pass and add them to merge requirements.
Step 3: You’re safe now.
Whether you ask the AI to refactor or add features, Hurl protects existing behavior. When drift occurs, CI catches it immediately.
It is not foundation work but seismic retrofitting. You reinforce the building without closing the store.
Not the End of Vibe Coding, but Its Evolution
Andrej Karpathy, who coined vibe coding, declared in February 2026 – exactly one year later – “The vibe coding era is over.” The new paradigm is agentic engineering – humans don’t write code; they orchestrate agents that autonomously plan, implement, and test.
Thoughtworks Technology Radar (2025) placed Spec-Driven Development at “Assess” level. Martin Fowler’s team published an SDD tool analysis. The industry is converging in the same direction.
Storey (2026) theorized the root causes of drift as two new debt concepts. Cognitive debt – the erosion of the team’s shared understanding, and intent debt – the failure to externalize why something was built a certain way. A Hurl file is precisely the externalization of intent. The declaration “this endpoint must behave this way” lives in git, separated from code.
Hurl tests are the smallest unit of this transition. You don’t need to write 10 specs. You don’t need to learn OpenAPI. One Hurl file is one contract. And that contract suppresses drift structurally without limiting the agent’s freedom.
Don’t change the model. Add a contract.
Related Posts
- yongol – The Keel of AI Coding SaaS – Enforces full-stack consistency with 10 SSOTs. Hurl is one of them.
- Ratchet Pattern – How to Make Agents Finish the Job – Theoretical background of deterministic verification and ratchet locking.
- Ratchet Code That Exploits IFEval – Feedback loops exploiting sycophancy bias and Reins.
References
- Cursino, D. et al. (2026). “Speed at the Cost of Quality? The Impact of AI Coding on Software.” MSR 2026. arxiv.org/abs/2511.04427
- METR (2025). “Measuring the Impact of Early AI on Experienced Open-source Developer Productivity.” arxiv.org/abs/2507.09089
- Google Cloud (2025). DORA Report 2025. cloud.google.com
- Wang, Z. et al. (2026). “TDAD: Test-Driven Agentic Development.” ACM AIWare 2026. arxiv.org/abs/2603.17973
- Autonoma (2026). “Amazon Vibe Coding Failures: 4 Sev-1s in 90 Days.” getautonoma.com
- CNBC (2026). “Amazon convenes ‘deep dive’ internal meeting to address AI-related outages.” cnbc.com
- Thoughtworks (2025). “Spec-Driven Development.” Technology Radar Vol.33. thoughtworks.com
- Karpathy, A. (2026). “From Vibe Coding to Agentic Engineering.” thenewstack.io
- Fowler, M. et al. (2025). “SDD Tools.” martinfowler.com
- Liu, Y. et al. (2026). “Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild.” arxiv.org/abs/2603.28592
- Meyer, B. (1992). “Applying ‘Design by Contract’.” Computer, 25(10), pp. 40-51. doi.org/10.1109/2.161279
- Storey, M.-A. (2026). “From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI.” arxiv.org/abs/2603.22106
- Hurl. hurl.dev | github.com/Orange-OpenSource/hurl
Changelog
- 2026-05-22: Initial release