Reins Engineering — AI with Reins Image: AI generated

A Horse Without Reins


AI coding tools got fast. Login in 30 seconds. Payments in 2 minutes. An MVP ships in three weeks.

Three months later, it collapses.

The AI “cleans up” payment logic and changes discount calculations. A refactoring request alters public API field names. Adding a new feature breaks authentication. According to Carnegie Mellon research (MSR 2026), code complexity permanently increases 41% after AI coding tool adoption. The Google DORA Report (2025) shows a 7.2% decrease in delivery stability for every 25% increase in AI adoption.

The problem isn’t that AI is stupid. It’s that there are no reins.


Harnesses Are Fences

The industry answered with “harness engineering.” Linters, formatters, CI/CD, project structure, coding guidelines. Fences that keep the agent from going outside.

Fences don’t set direction. Whatever the agent does inside the fence — overwriting existing logic, changing types, skipping state transitions — the linter passes. The formatter passes. CI passes. Code arrives at production “clean but wrong.”

The saddle is on. The rider is mounted. But without reins, they hold on with their thighs and fall off after three months.


Reins Engineering

Reins Engineering is an engineering approach that gives AI agents deterministic contracts and blocks progress when contracts are violated.

It consists of three elements:

1. Deterministic Feedback

Give the agent facts, not opinions. Not “this looks weird” but “line 41: field name mismatch, expected ‘user_id’, got ‘userId’.” Feedback with no room for sycophancy. According to the TDAD study (arxiv 2026), procedural “do TDD” instructions worsen regressions (6.08% → 9.94%), while providing specific test files in context reduces regressions by 70% (6.08% → 1.82%).

2. Contract Locking (Ratchet Pattern)

When verification passes, lock it. Hurl tests declare API behavior in plain text, running on every commit in CI. Passing tests cannot be deleted. The agent can freely change code, but cannot change behavior. Drift is structurally suppressed.

3. Separating Decisions from Implementation

Three things mixed in code — user decisions, business logic, implementation details — are separated. Decisions live in declarative specs (OpenAPI, DDL, state diagrams). Implementation is freely generated by AI. AI cannot mistake decisions for details and overwrite them. Decision survival becomes independent of model size.


Evolution

Prompt Engineering      → Say it well and it works
Context Engineering     → Give good context and it works
Harness Engineering     → Contain it with structure
Reins Engineering       → Steer it with direction

Each stage was born from the limitations of the previous one. Prompts alone lacked consistency. Context didn’t stop the agent from going rogue. Fences couldn’t prevent drift inside the perimeter.

Reins Engineering is not a fence — it’s reins. It doesn’t constrain the agent’s freedom; it ensures the agent reaches the destination.


Why Bigger Models Aren’t the Answer

“GPT-6 will fix it.”

It won’t. The problem isn’t model intelligence — it’s the medium. Code as a medium doesn’t distinguish decisions from implementation. Any model reading code sees decisions and details mixed in the same text.

A 4.5B local model (Gemma4) with deterministic feedback + example context edits SSOTs to zero errors. A frontier model editing raw code produces drift. The difference is structure, not intelligence.

Don’t change the model. Add a contract.


Evidence

yongol is the implementation of Reins Engineering. It cross-validates the consistency of 10 declarative specs (SSOTs) with 287 rules and generates code.

ZenFlow benchmark — a multi-tenant workflow automation SaaS. 32 endpoints, 14 tables, 47 Hurl requests. 11/11 stages passed. Adding features didn’t slow down. Existing tests never broke.

A working backend was successfully generated with a local 4.5B model. Cost $0. Offline. Reins bridges the gap that model size leaves.


A Harness Without Reins Is Just a Fence

AI is already powerful enough. What’s missing is direction.

Build higher fences and the agent drifts faster inside them. Hold the reins and the agent runs to the destination.

Reins Engineering — structured deterministic validation for AI agents.



References