Class 6. Lock When It Passes — Ratchet Pattern Principles and Bulk Application

Class 6 Image: AI generated

Quick Tips — Just Know This and You Can Command AI

The ratchet’s core is one sentence: Every time you add a feature, hurl –test must pass before moving on. That’s the ratchet. Like a cogwheel that doesn’t reverse, a test that passes once never breaks.

“Don’t trust ‘all done.’” AI is optimistic. It does only 40 of 527 functions and declares “done.” Verify with numbers — when TODO is 0, it’s done. Until then, it’s not done.

When giving agents bulk tasks:

To the agent: “Run tsma next, write tests for the TODO function. If the test passes, move to the next function with tsma next. Repeat until All functions complete! appears.”

This repetition is everything. tsma next decides the next task, a verifier (go test, hurl –test, etc.) judges pass/fail, and the machine declares “done.” AI only generates.

Don’t instruct methods — instruct contracts. Not “do TDD” but “this test must pass.” Trying to teach methodology confuses it. Just give the conditions to achieve.

If the agent dies mid-session, it’s fine. Run tsma next again and it continues from the last processed function. Progress is preserved.

Hands-on Try

Open the Class 1 app with Claude Code and command:

To the agent: “Add 3 features in order. After adding each feature, run hurl –test tests/ to make sure all existing tests pass before moving on. If anything fails, fix it to pass first.”

Observe while the agent works:

Does the 2nd feature break the 1st feature’s test? — It must fix before moving on
Does the 3rd break the 1st and 2nd? — Again, fix first
Test pass is the completion criterion, not the “all done” declaration

This is the ratchet. Hurl from Class 3 was already serving as the ratchet’s verifier.

Why You Need to Command This Way

Previous 5 Classes Summary

In Class 5 we learned the three pillars of Reins Engineering. Direct with deterministic contracts, lock with ratchets, separate decisions from implementation. Today we dig deep into the ratchet — the most essential of the three pillars.

The ratchet is Reins Engineering’s technical heart. Understand this and everything else follows.

“All Done”

Remember the 40 vs 527 case from Class 5. Same model, same project, same 527 functions — the autonomous agent stops at 40 (7.6%), but with the ratchet it completes 527.

Class 5 explained why this difference occurs. This class dissects how the ratchet enforces it, breaking down the structure.

LLMs are good at generation. But they can’t be trusted to judge completion.

So what do we do?

The Ratchet Wrench

In a toolbox there’s a ratchet wrench. Used for tightening bolts. Teeth engage in one direction only. Turn and it goes forward; release and it stops but doesn’t reverse.

This mechanism is key.

Item 1: Mechanical verification → PASS → Lock. Next.
Item 2: Mechanical verification → FAIL → Feedback. Retry.
Item 2: Mechanical verification → PASS → Lock. Next.
...
Item N: PASS → All complete. Stop.

Only three rules:

Show only one item at a time. The agent can’t choose to skip.
Must pass to move on. Can’t skip ahead.
Stop when all pass. “All done” is said by the machine.

Apply these three rules and the agent that stopped at 40 goes to 527.

What the Ratchet Enforces — Transferring Completion Judgment

In Class 5 the key was “the difference isn’t model performance but who decides ‘done.’” This class breaks down that “machine-decided structure” into components.

Take completion judgment away from the agent and hand it to the machine.

This is the core that runs through the entire class.

One-Sentence Definition of the Ratchet

Place AI that produces different results every time inside a checklist that always operates by the same rules.

Expanded:

Role	Who handles it
Generation (writing code, writing tests)	LLM
Judgment (pass or fail)	Verifier (go test, validate, etc.)
Progress management (what’s next, is it done)	Ratchet (CLI)

In vibe coding, all three roles are given to the LLM. LLM writes code, LLM judges if it’s good, LLM decides what to do next.

Ratchet Pattern separates these three. LLM only generates. The machine does the rest.

Five Principles

Ratchet Pattern has five principles. If any one is missing, it’s not a ratchet.

Principle 1. Termination condition is mechanical

pass/fail. Not “looks good.”

go test passes → PASS.
Coverage meets target → PASS.
validate errors at 0 → PASS.

No room for subjective judgment. “Pretty good?” isn’t a verdict. It’s 0 or 1.

Principle 2. PASS is immutable

Passed items don’t reopen. No reversal. Remaining work never increases.

What you made today won’t be torn apart tomorrow. Only forward.

Important distinction here. Ever run a “24-hour agent”? An agent running without termination conditions removes today’s abstraction tomorrow and re-adds it the day after. The ratchet doesn’t allow such oscillation.

Principle 3. LLM only generates

Generating code, writing tests, presenting fixes — that’s the LLM’s role.

What to fix? Machine decides. Pass or fail? Machine judges. What’s next? Machine tells. Is it done? Machine declares.

AI isn’t the one setting direction — it’s the one that only builds what’s assigned.

Principle 4. Strip the agent’s termination judgment authority

When the LLM says “done,” it stops at 40. When the machine says “done,” it stops at 527.

The ratchet’s reason for existence is summarized in this one line.

Principle 5. Verifier must be deterministic

Not just anything can be a verifier.

Can be a Verifier	Cannot be a Verifier
`go test`	“looks cleaner”
coverage measurement	“seems better”
AST validation	“more scalable”
schema diff	“clean architecture”
Hurl test	“overall looks fine”

Four conditions for a Verifier:

Same input always gives same result (Deterministic)
No human needed to check (Machine-checkable)
Can resume if interrupted (Resumable)
Tells you where and why it failed (Localized feedback)

Without these four, the ratchet teeth don’t engage.

TDAD Study — “Do TDD” Instruction Backfires

Remember the TDAD study introduced in Class 5. Here we dig deeper into its relationship with the ratchet.

Software engineering textbooks say “TDD (Test-Driven Development) is good.” Write tests first, code later.

But what happens when you instruct an AI agent “do TDD”?

The TDAD (Test-Driven AI Development) study measured this. Results are shocking.

Instruction	Regression rate
“Do TDD” (method instruction)	Worsened vs baseline
“This test must pass” (contract instruction)	70% regression reduction

“Do TDD” instructed a method. The LLM gets confused trying to mimic TDD’s form. Makes tests first then can’t write code, writes code then overwrites tests.

“This test must pass” instructed a contract. The LLM knows exactly what to do. Make this test pass.

The lesson:

Don’t instruct methods — instruct contracts.

Not “write Hurl tests first then code.” “This Hurl test must pass.” Don’t try to teach LLMs methodology. Just give conditions to achieve.

The ratchet operates on exactly this principle. Not “do TDD” but “this item must be PASS.”

tsma — The Ratchet’s Practical Tool

With theory understood, let’s look at the practical tool.

tsma is a CLI tool that applies Ratchet Pattern to projects. It supports Go, TypeScript, and Python. It does four things:

Function indexing — Finds all functions in the project
Test detection — Checks if each function has tests
Coverage measurement — Measures how much tests cover each function
Feedback generation — Reports uncovered branches with line numbers

The agent needs to know one command:

$ tsma next

This single command drives the entire loop:

$ tsma next          # Shows the next function without tests
  → Agent writes tests
$ tsma next          # Detects new tests, runs them, measures coverage
  → 100%? PASS. Next function.
  → <100%? Reports uncovered branches with line numbers.
$ tsma next          # Re-measures modified tests
  → PASS or DONE and moves to next.

Repeat until “All functions complete!” appears.

Agent instructions are 6 lines:

1. Run tsma next
2. If TODO — read function and write test
3. If test fails — read error and fix test
4. If uncovered branches shown — add tests covering those branches
5. If PASS/DONE — next function appears automatically
6. Repeat until "All functions complete!"

6 lines is everything. The CLI enforces the rest.

Measured Results from 527 Functions

Actual results from applying tsma to a real project (527 functions):

Result	Count	Ratio
PASS (100% branch coverage)	246	46.7%
DONE (best-effort, structural limits)	281	53.3%
TODO (unprocessed)	0	0.0%

TODO is 0. All 527 functions were processed.

246 functions reached 100% branch coverage. Why couldn’t the remaining 281 reach 100%?

This isn’t the LLM’s limitation but a code testability issue. Some functions are physically impossible to test due to their code structure. Functions directly depending on external services (SMS sending, payment integration, etc.) can’t have those services substituted in the test environment.

tsma tries once more in such cases and accepts them as DONE. That’s realistic.

What matters is the fact that TODO is 0. The ratchet pushed the agent all the way, forcing a state of “tried every single one without missing any.”

Feedback Is the Gradient Signal

The ratchet’s real power isn’t just judging “pass/fail.” What it tells you upon failure is decisive.

Weak feedback:  "Test failed"              → LLM fixes without direction
Medium feedback: "Coverage 65%"             → LLM roughly reinforces
Strong feedback: "line 41, 44, 70 uncovered" → LLM covers exactly those branches

Numbers verified in real projects:

Without feedback:  Stalls at 60~70% coverage
With feedback:     100% achieved (for reachable functions)

Same model. One line of “line 41 not covered” acts as a gradient signal.

In deep learning terms: when training neural networks, if the loss function only says “wrong” learning fails. It needs gradients telling “where and how much wrong” for weights to adjust precisely.

LLM code fixing works the same way:

“Test failed” = only told the loss. No direction.
“line 41 uncovered” = told the gradient. Precise fix possible.

As feedback resolution increases, LLM’s fix accuracy increases, loop iterations decrease, and token costs decrease.

Symbolic Feedback Loop — The Ratchet’s Heart

The Symbolic Feedback Loop from Class 5 is the ratchet’s heart. LLM generates, deterministic tools judge, results fed back to LLM.

How this loop concretely works in the ratchet. For tsma:

tsma next designates the next function (machine sets direction)
LLM generates tests (probabilistic generation)
go test judges pass/fail (deterministic feedback)
On failure, returns the fact “line 41 uncovered” (gradient signal)
LLM fixes → back to 3
On pass, ratchet locks → back to 1

In Class 5, “feedback topology matters more than model IQ.” The ratchet is the physical device that enforces this topology per item.

Agents Die. Progress Survives.

Let’s talk reality. Agents always crash.

Hit token limits.
Network errors.
Session disconnects.
Context fills up.

Can’t process 527 functions in one session. So what do you do?

The ratchet persistently stores progress state. tsma records in .tsma/session.json.

Agent A: Processes functions 1~200 → dies at token limit
Agent B: tsma next → continues from 201
Agent C: tsma next → continues from 401

Agents are disposable. Progress accumulates.

When your Claude Code session disconnects, running tsma next in a new session continues from the last processed function. session.json is the checkpoint.

This is fundamentally different from “running a 24-hour agent.” A 24-hour agent must start over when the session breaks. The ratchet continues from where it stopped.

Swapping the Verifier Creates a Different Tool

The ratchet isn’t tied to a specific verifier. Swap the verifier and it becomes an entirely different purpose.

Ratchet + Verifier	Purpose
Ratchet + `go test` + coverage	Per-function test generation
Ratchet + `hurl --test`	API endpoint verification
Ratchet + `yongol validate`	SSOT specification consistency
Ratchet + filefunc validate	Code structure rule enforcement
Ratchet + schema diff	Database migration

One pattern. The verifier determines the domain.

yongol validate from Class 4 can be used as a ratchet verifier. Hurl from Class 3 can be used as a ratchet verifier. The tool doesn’t change — you plug a different verifier into the same pattern.

Three Steps from 40 to 527

Organizing theory into three steps:

Step 1: List the items

Telling the agent “fix the whole project” leads to broad exploration. LLM loses direction. Instead, list items. “527 functions. Starting from #1.”

tsma does this automatically. It indexes all project functions to create a list.

Step 2: One at a time, verify and progress

“Write tests for this project” — that’s broad exploration. “Write a test for the calculateTax function” — that’s local correction.

LLMs are stronger at local correction than broad exploration. The ratchet decomposes work to match LLM’s strengths.

Step 3: Machine declares “done”

Even if the agent says “all done,” ignore it. When tsma status shows “TODO: 0,” it’s done. Until then, it’s not.

$ tsma status

527 functions
PASS:  246 (46.7%)
DONE:  281 (53.3%)
TODO:    0 (0.0%)

Why This Matters for Vibe Coders

You don’t read code. Since you don’t read code, you have no way to verify if AI really completed everything.

But with the ratchet, you don’t need to verify.

“TODO 0?” — Done.
“TODO 37?” — Not done.

Just look at one number. No need to read code. No need to understand tests. The machine judges and the machine reports.

This is why the ratchet is especially powerful for vibe coders. The less you know about code, the more you must rely on mechanical verification. The ratchet makes that reliance safe.

Analogy: Trains and Tracks

Vibe coding is a train. Fast. One phrase of “make it” and code pours out.

But without tracks, it derails.

AI coding tools are all focused on making trains faster. Bigger models, smarter agents, better prompts. But the faster the train, the worse the derailment damage.

The ratchet is tracks. LLM generates code (train), CLI defines “this far, next is there” (tracks). LLM’s generation ability stays intact, but direction and destination are enforced by the machine.

Many people are making trains. Almost nobody is laying tracks yet.

Key Summary

LLMs generate well but can’t judge completion. They say “done” at 40.
Handing completion judgment to the machine completes 527. This is Ratchet Pattern.
Don’t instruct methods — instruct contracts. Not “do TDD” but “this test must pass.”
Feedback resolution determines fix accuracy. “line 41 uncovered” is 10x more effective than “failed.”
Agents die but progress is preserved. session.json is the checkpoint.
Swapping verifiers creates a different tool. One pattern, verifier determines domain.

Next in Class 7 we dig into the principle of why this ratchet works. How LLM’s sycophancy bias becomes the ratchet’s driving force, and the ratio for designing prompts and verifiers.

Exercise: Complete 20 Tests with the Ratchet (tsma)

Goal: Select 20 functions from an existing project, apply ratchet with tsma, complete 20/20.

Requirements: tsma installed, project (Go, TypeScript, or Python)

Install: Tell the agent “install tsma.” If tsma --help outputs, installation is complete.

tsma supports Go, TypeScript, and Python projects. Whatever language your Class 1 app uses is fine.

Step 1: Indexing

$ cd your-project
$ tsma next

On first run, tsma indexes all functions in the project. Shows current status.

TODO: calculateTax (handler/tax.go:15)
No test file found.

Step 2: Give the Agent Ratchet Instructions

Tell Claude Code:

Run tsma next, write tests for the TODO function.
If the test passes, move to the next function with tsma next.
If uncovered branches are shown, add tests covering those branches.
Repeat until "All functions complete!" appears.

Step 3: Observe

While the agent works, observe:

Does the agent try to skip? → Ratchet forces it to stay on the current function
Coverage difference with vs without feedback? → Spikes after line number feedback
How many tries to convert to PASS? → Usually 1-2

Step 4: Verify Results

$ tsma status

TODO at 0 means completion. All 20 functions have tests.

Comparison Experiment (Optional):

For the same 20 functions, tell the agent “write tests for these 20 functions” without tsma. See how many it stops at.

Reins Engineering Full Course

Class	Title
Class 0	Install Claude Code
Class 1	How to Command AI
Class 2	How to Distrust AI
Class 3	Apps That Don’t Break
Class 4	Decisions Outside Code
Class 5	AI with Reins
Class 6	Pass Then Lock
Class 7	Flipping Sycophancy
Class 8	The Agent’s Factory
Class 9	Automation Beyond Code
Class 10	The Law of Data
Class 11	How to Rescue Failed Vibe Coding

Sources

TDAD, ACM AIWare 2026 — “Do TDD” procedural instruction (6.08% → 9.94%) worsens regression; providing specific test files as context (6.08% → 1.82%) reduces regression by 70%.