Reins Engineering

What Is a 'Turn' in reins?

An anatomy of the turn, the smallest unit of execution in reins. What is not recorded is not a turn — from this one definition, driver independence, restart resilience, and auditability all follow. Compared against the June 2026 Loop Engineering discourse, we see how the turn converts those recommendations into structure.

2026-07-07

Systems Make Genius Shine Brighter

Genius without structure drifts, and structure alone is mediocre. Only when genius and structure multiply does the real value emerge. The ZenFlow benchmark (Claude Sonnet, 32 endpoints, 43 minutes) and historical proof from B-17, Toyota, and WHO checklists all demonstrate the same principle.

2026-06-25

abloq — A Blog an Agent Operates, a Machine Locks the Verification

Hand a blog to an agent and the articles come out. The problem is you can't trust them — it fabricates sources, bumps the lastmod of an article it never touched, and edits files no one asked it to. If a human has to inspect every line, there was no point delegating. abloq's answer is a division of labor: generation is probabilistic, verification is deterministic. The only thing a human writes is a single insight specification (insight.yaml); authoring, translation, refresh, and evidence work are carried out by agents as quests; and quality is guaranteed by a deterministic gate derived from a single blog.yaml. A locked PASS is irreversible — the agent may be disposable, but progress accumulates.

2026-06-11

Why Your Agent Loop Diverges

The more Loop Engineering spreads, the more people hit the same wall — the loop won't converge, it diverges. Infinite spinning, drift, reward hacking: the three faces share one root. You plugged the generator itself back into the loop's judgment slot. And divergence is actually the lucky case. You can see it. What's truly terrifying is the loop that silently fakes convergence. The cure is singular — give the authority to lock 'done' not to the LLM but to a deterministic gate alone.

2026-06-11

Production Traffic Is the Spec

Legacy code has no documentation. No tests either. And yet it's running right now. A month of well-recorded logs is the spec — build Hurl integration tests that capture the current behavior from production traffic, and you can pin down what the legacy does and lay a safety net for refactoring without reading a single line of code.

2026-06-06

Burning a City for a Single Answer

A trillion-parameter model burns a city's worth of electricity and water just to spit out a single answer. I thought this was insane. Searching for a way out, I learned something. The flaw everyone was trying to fix, the LLM's sycophancy, was the answer itself. Feed it fact and sycophancy becomes accuracy. This is the story of why I started Reins.

2026-06-06

reins — Keep Only the Domain in a Quest CLI; Make the Ratchet a Framework

how-make-quest taught you to build a quest CLI with your bare hands. But build a second CLI and you write the same ratchet, the same scan/next/submit, the same tallying all over again. reins pulls that invariant out into a framework — reins supplies the ratchet, the command skeleton, the tallying, and export; you implement only your domain's gate (the four methods of gate.Definition). The gate is a catalog of cheese-defense rules, and the toulmin defeat graph hands the agent a strategy guide for 'why you lost and what to change to win.'

2026-06-05

The Tool That Gave Us the Reins Had No Reins of Its Own — The Boundary Between Harness and Reins

"Reins Engineering — isn't that just harness engineering?" The two don't oppose each other — they're different parts of the same tack. But they are different parts. Even the world's best coding agent put no reins on its own code. That's because reins aren't something you have; they're something you apply.

2026-06-04

How to Make a Quest CLI — Build a Tool That Lets the Machine Judge Completion

AI says "Done." In reality, it isn't finished. This article is about building the tool that solves that problem — a quest CLI — with your own hands. From the principle (why) to the cobra command skeleton (how), this single article is enough for an agent to build a Go quest CLI. huma is the worked example.

2026-06-03

The Preconditions for Improving LLM Multi-Agent Accuracy

Run several agents and you get more accurate? Only half true. Models trained on the same data fail in the same places. Multi-agent works under two conditions — design for error independence, or, in a verifiable domain, stand up a verifier outside the LLM.

2026-06-02

Why Your Agent Never Stops

When someone brags about running their agent 24/7, the feeling it stirs isn't admiration but a question — why isn't it done yet? Code is not a search problem; it's a constraint satisfaction problem. A healthy system is one that can stop.

2026-06-01

On Beauty

Seventy percent of what is beautiful is mathematics. A machine locks the order deterministically, and only the remaining 30% of complexity is decided by a human. Reins Engineering is not an AI coding tool — it is the principle of locking the order and leaving only the complexity to people.

2026-05-31

Who Defines 'Done'? — The Problem Games Solved 40 Years Ago

The moment you define tenant move-out confirmation as five photos, it becomes a game quest. Defining 'done' not as the agent's claim but as a mechanically verifiable condition — games solved this 40 years ago, and it is the right way to get AI agents to actually do their job.

2026-05-30

Class 11. How to Rescue a Broken Vibe-Coded App

Your vibe-coded app just broke. You don't need to rebuild it. Diagnose it, lock it down, and step out one move at a time.

2026-05-28

Supabase Is a Vibe Coding Trap

The reason AI recommends Supabase is not technical superiority — it's because tutorials dominate the training data. Once business logic enters a black box, agents can't track it. Getting in takes 30 seconds. Getting out takes 3 months.

2026-05-28

Building Agent-Operable Systems

60–80% of Fortune 500 IT budgets go to guarding locked legacy. Because they can't open it. The real meaning of the AI bubble is not smarter models — it is that locked corporate memory is becoming reachable.

2026-05-27

huma -- A Ratchet That Never Skips an Endpoint

When you ask an AI agent to test 42 endpoints, it declares 'done' around the 15th. huma turns the endpoint list into a ratchet session so the agent cannot skip a single one. scan, next, write, verify. Four commands, zero config.

2026-05-26

codistill -- Squeeze SSOT Out of Existing Code

Do you have to start from SSOT to use yongol? No. codistill auto-extracts OpenAPI, DDL, and sqlc queries from existing code across 16 web frameworks in 8 languages. Not foundation work -- seismic retrofitting.

2026-05-26

Agent Operable Codebase

Is code that is easy for humans to read the same as code that is easy for agents to work with? It is not. When a file has 20 functions, agent performance drops by 30-85%. The office must be turned into a factory.

2026-05-25

Class 10. Law of Data — Agent Operable Data

When code is wrong, tests catch it. When data is wrong, nobody knows. Schema is the law I establish.

2026-05-24

Class 9. Automation Beyond Code — Agent Operable System

Is agent-operable code enough? Build, deploy, monitor — the structure where agents operate the entire system.

2026-05-24

Class 8. Agent Factory — Agent Operable Codebase

20 functions in one file → agent performance drops 30-85%. Split with filefunc, test with tsma.

2026-05-24

Class 7. Flipping Sycophancy — Balancing Prompts and Verifiers

Give opinions and it flatters; give facts and it fixes. How to use sycophancy bias not as a bug but as an asset.

2026-05-24

Class 6. Lock When It Passes — Ratchet Pattern Principles and Bulk Application

AI declared 'all done.' In reality it was 40/527. Ratchet Pattern hands completion judgment to the machine.

2026-05-24

Class 5. AI with Reins — Introduction to Reins Engineering

Harness engineering is a fence. Reins Engineering is a bridle. Don't change the model — add contracts.

2026-05-24

Class 4. Decisions Outside Code — yongol and Declarative Full-Stack Control

AI can't distinguish decisions from implementation details mixed in code — that's the root cause of drift. yongol separates decisions into 10 declarative specifications and catches contradictions across layers with 287 rules.

2026-05-24

Class 3. Apps That Don't Break — Hurl, Git, CI/CD

Declare API contracts with Hurl, create save points with Git, automate verification with CI/CD. When the three combine, they become a ratchet — a gear that only moves forward and never backward.

2026-05-24

Class 2. How to Distrust AI — Limits and Causes of Vibe Coding

Drift where AI silently alters existing features while adding new ones, 58% sycophancy bias, the math where 97% multiplied 5 times becomes 86%. Why it crumbles at 5 features.

2026-05-24

Class 1. How to Command AI — Vibe Coding Essentials

From installing Claude Code to managing context with CLAUDE.md. How someone who doesn't know code can command AI.

2026-05-24

Class 0. Install Claude Code — What You're Using Might Not Be Claude Code

Half of what YouTube calls 'Claude Code' is not Claude Code. Even with the same model, a different agent produces different results. The gate to this course closes here.

2026-05-24

Reins Engineering — AI with Reins

Harness engineering is a fence. It keeps the agent from going outside, but doesn't ensure it reaches the destination. Reins Engineering is the reins — steer with deterministic contracts, lock with ratchets, separate decisions from implementation.

2026-05-23

Hurl Stops Vibe Coding Drift

Vibe coding collapses under logic drift within 3 months. CMU, METR, DORA, and Amazon cases prove it. Declare API contracts in plain text with Hurl and lock them with a ratchet -- you suppress drift structurally without limiting AI's freedom.

2026-05-22

Ratchet Code That Exploits IFEval

LLM sycophancy bias is not a bug but an asset. Combine the instruction-following ability that IFEval measures with deterministic feedback, and even a 4.5B local model produces a convergence loop that generates correct code.

2026-05-20

yongol — The Keel of AI-Coded SaaS

Vibe coding collapses at 200 endpoints because AI cannot distinguish decisions from implementation details. yongol shifts the AI workload from code to 10 declarative specs and enforces cross-layer consistency before compilation. Harness with reins.

2026-05-18

AI Sycophancy Bias Is a Business Feature

Sycophancy bias in LLMs is not a bug. It is a mathematical inevitability of RLHF and a commercial feature that big tech has no incentive to fix. This is why LLM-as-Judge is structurally impossible.

2026-05-18

Why Coding Agents Work and Why They Break

The same model hallucinates in web chat but ships a 200-line feature in a coding agent. Not because the model changed — because the topology changed. Generation can be probabilistic. Verification must be deterministic.

2026-05-16

Ratchet Pattern — How to Make an Agent Finish the Job

I asked an AI agent to write tests for 527 functions. It stopped at 40 and declared 'done.' The Ratchet Pattern forces completion by delegating the done/not-done decision to a mechanical verifier — so the agent keeps going until the machine says stop.

2026-05-15

Feedback Topology Over Model IQ

The same model stalls at 40 or completes all 527. The difference is not the model — it is the feedback structure. LLM performance depends far more on how fast and deterministic the feedback loop is than on the model itself.

2026-05-14

tsma -- Regression Defense Line for Legacy Code

A CLI tool that indexes every function, detects test presence, measures coverage, and gives precise feedback to LLM agents. One command builds a regression defense line around legacy code.

2026-05-14

filefunc — One File, One Concept

The navigation unit for an AI code agent is the file. filefunc is a Go code structure convention and CLI tool that enforces one concept per file.

2026-03-16