Hurl Stops Vibe Coding Drift

Hurl Stops Vibe Coding Drift Image: AI generated

The 3-Month Wall

If your vibe-coded app collapsed after 3 months, if you are dealing with drift where the AI overwrites existing logic, if you want to protect API contracts from code changes – Hurl and the ratchet are the solution.

You build a SaaS with vibe coding. At first it is fast. “Make login” – 30 seconds. “Add payments” – 2 minutes. An MVP ships in 3 weeks.

Three months later, strange things happen. The AI “tidies up” the payment logic and silently changes the discount calculation. Adding a new endpoint breaks existing authentication. You ask for a refactoring and the field names on the public API change, killing every client.

This is called logic drift – the AI unintentionally modifying existing business logic. Regression bugs exist in traditional development too. But logic drift is different. Changes the developer did not intend happen without the developer noticing, across the entire codebase. Because every prompt starts in a fresh context window.

Drift in Numbers

This is not sentiment. There is data.

The price of speed is complexity. A Carnegie Mellon team compared 807 GitHub repositories before and after Cursor adoption (MSR 2026). In the first month, code additions increased 3-5x. Two months later the speed advantage vanished. What remained was a 30% increase in static analysis warnings and a permanent 41% increase in code complexity. This finding was independently replicated. Liu et al. (2026) analyzed 302,600 AI-authored commits across 6,299 GitHub repositories and found that unresolved technical debt surged from a few hundred in early 2025 to over 110,000 by February 2026.

It didn’t get faster – it got slower. The nonprofit AI research organization METR ran a randomized controlled trial with 16 experienced open-source developers (2025). On projects they knew well, the group using AI tools took 19% longer to complete tasks. Yet the developers themselves perceived a 20% speed improvement. The gap between perception and reality was 39pp. Results may differ for new projects, but the assumption “AI = always faster” breaks.

Stability collapses at scale. According to Google’s DORA Report (2025), for every 25% increase in AI adoption, software delivery stability decreases by 7.2%.

It actually collapsed. Amazon mandated AI coding tools company-wide in 2025 and deployed 21,000 AI agents. During the same period, approximately 30,000 employees were laid off, drastically reducing review capacity. The combination of rapid AI code generation and reduced review capacity resulted in 4 Sev-1 incidents over 90 days. On March 5, 2026, a 6-hour outage caused an estimated loss of 6.3 million orders. An internal document stated: “GenAI’s rapid code generation is inadvertently exposing vulnerabilities, and current safety measures are wholly inadequate.”

“Do TDD” Is Not the Answer

The common advice for vibe coding drift is “write tests.” The direction is right, but how you provide tests determines the outcome.

The TDAD study (arxiv 2026) tested this precisely. They had Qwen3-Coder 30B solve 100 SWE-bench Verified instances.

Condition	Regression rate
Baseline (no test instruction)	6.08%
“Do TDD” procedural instruction	9.94% (worse)
Providing affected test files as context	1.82% (70% reduction)

Telling it “do TDD” makes things worse. The agent strays from its main task trying to follow the procedural instruction. But providing the concrete context “these test files must pass” cuts regressions by 70%.

The difference is clear. Not “how to test” as an instruction, but “what must pass” as a contract.

Hurl: Contracts in Plain Text

The concept of “contracts” in software was formalized by Bertrand Meyer (1992) – the trinity of preconditions, postconditions, and invariants that specify obligations between modules. Hurl applies this principle to the HTTP boundary. It is a testing tool that declares HTTP requests and expected responses in plain text. Maintained by Orange (France Telecom), it is a Rust binary with zero runtime dependencies and 18.7k GitHub stars. Fast enough to run on every commit in CI.

# Login success
POST http://localhost:8080/api/auth/login
{
  "email": "test@example.com",
  "password": "secret123"
}
HTTP 200
[Asserts]
jsonpath "$.token" exists
jsonpath "$.user.email" == "test@example.com"

# Unauthenticated access returns 401
GET http://localhost:8080/api/pages
HTTP 401

Two contracts. Login must return 200 with a token, and unauthenticated access must return 401.

When this file is committed to git and executed on every commit in CI – the moment the AI “tidies up” the auth logic and 401 becomes 200, the commit is rejected. Drift is caught before it reaches production.

Why Hurl

Unit tests can also catch drift – if you don’t give the AI permission to modify the test files. But unit tests verify internal functions of the code, so they are structurally coupled to the implementation. When function names change, tests break too, and you must update tests with every refactoring.

Hurl sits at the HTTP boundary. It declares only requests and responses. It knows nothing about the code’s internal structure. No matter how the AI changes the code, if the externally observable behavior stays the same it passes; if it differs it fails. It is naturally independent of implementation, not behavior.

	Unit tests	Hurl
Verification target	Function internals	HTTP contract
On AI refactoring	Changed together	Unchanged
Drift detection	Conditional (if modification blocked)	Natural
Code structure dependency	High	None
Human readability	Code level	Plain text
LLM generation ease	Requires code structure understanding	Only needs HTTP knowledge

What Hurl verifies is not code but behavior. The AI can freely change the code. The behavior must not change. This distinction is the key to catching drift.

Ratchet Lock

When a Hurl test passes, it locks. This is the ratchet. A locked Hurl test is ratchet code — deterministic code that makes a passed API contract irreversible.

1. Write Hurl tests for the current API (or auto-extract)
2. Run on every commit in CI
3. Passed tests cannot be deleted or modified
4. When adding new features, add new Hurl tests
5. All existing tests + all new tests must pass to merge

When you tell the agent “refactor the code,” it freely changes the code. But if Hurl tests break, the commit is rejected. The agent must refactor while preserving all existing behavior. Drift is still possible in edge cases Hurl doesn’t cover, but for covered behavior, drift is structurally suppressed.

This aligns exactly with the TDAD study’s finding. Not the procedural instruction “write tests,” but the concrete contract “these Hurl files must pass.” The agent can choose methods, but it cannot violate the contract.

It Works on Legacy Too

Already running production with vibe coding? No need to start from scratch.

Step 1: Capture current behavior in Hurl.

If you have API documentation, transcribe it to Hurl. If not, have the agent read the existing code and write Hurl tests. The goal is to declare “this is how every endpoint currently works” in plain text.

Step 2: Hook it into CI.

Confirm all Hurl tests pass and add them to merge requirements.

Step 3: You’re safe now.

Whether you ask the AI to refactor or add features, Hurl protects existing behavior. When drift occurs, CI catches it immediately.

It is not foundation work but seismic retrofitting. You reinforce the building without closing the store.

Not the End of Vibe Coding, but Its Evolution

Andrej Karpathy, who coined vibe coding, declared in February 2026 – exactly one year later – “The vibe coding era is over.” The new paradigm is agentic engineering – humans don’t write code; they orchestrate agents that autonomously plan, implement, and test.

Thoughtworks Technology Radar (2025) placed Spec-Driven Development at “Assess” level. Martin Fowler’s team published an SDD tool analysis. The industry is converging in the same direction.

Storey (2026) theorized the root causes of drift as two new debt concepts. Cognitive debt – the erosion of the team’s shared understanding, and intent debt – the failure to externalize why something was built a certain way. A Hurl file is precisely the externalization of intent. The declaration “this endpoint must behave this way” lives in git, separated from code.

Hurl tests are the smallest unit of this transition. You don’t need to write 10 specs. You don’t need to learn OpenAPI. One Hurl file is one contract. And that contract suppresses drift structurally without limiting the agent’s freedom.

Don’t change the model. Add a contract.

yongol – The Keel of AI Coding SaaS – Enforces full-stack consistency with 10 SSOTs. Hurl is one of them.
Ratchet Pattern – How to Make Agents Finish the Job – Theoretical background of deterministic verification and ratchet locking.
Ratchet Code That Exploits IFEval – Feedback loops exploiting sycophancy bias and Reins.

References

Cursino, D. et al. (2026). “Speed at the Cost of Quality? The Impact of AI Coding on Software.” MSR 2026. arxiv.org/abs/2511.04427
METR (2025). “Measuring the Impact of Early AI on Experienced Open-source Developer Productivity.” arxiv.org/abs/2507.09089
Google Cloud (2025). DORA Report 2025. cloud.google.com
Wang, Z. et al. (2026). “TDAD: Test-Driven Agentic Development.” ACM AIWare 2026. arxiv.org/abs/2603.17973
Autonoma (2026). “Amazon Vibe Coding Failures: 4 Sev-1s in 90 Days.” getautonoma.com
CNBC (2026). “Amazon convenes ‘deep dive’ internal meeting to address AI-related outages.” cnbc.com
Thoughtworks (2025). “Spec-Driven Development.” Technology Radar Vol.33. thoughtworks.com
Karpathy, A. (2026). “From Vibe Coding to Agentic Engineering.” thenewstack.io
Fowler, M. et al. (2025). “SDD Tools.” martinfowler.com
Liu, Y. et al. (2026). “Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild.” arxiv.org/abs/2603.28592
Meyer, B. (1992). “Applying ‘Design by Contract’.” Computer, 25(10), pp. 40-51. doi.org/10.1109/2.161279
Storey, M.-A. (2026). “From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI.” arxiv.org/abs/2603.22106
Hurl. hurl.dev | github.com/Orange-OpenSource/hurl