Who Defines ‘Done’? — The Problem Games Solved 40 Years Ago

Say you manage rental properties. A tenant has vacated, and a staff member needs to confirm the move-out.

Here is how I designed it. The staff member cannot say “I confirmed it.” Instead, they photograph five designated spots in the unit and upload them to the system. Once all five images arrive, the system marks the move-out as confirmed. If even one is missing, there is no completion.

When I explained this, someone said: “That’s basically a game quest, isn’t it?”

Yes. Exactly. And that one remark instantly captured what I had been wrestling with in code for years.

Games Solved This 40 Years Ago

“Collect five wolf pelts.” Games have done this for decades. And games never trust the player’s word. Saying “I got them all” does not complete the quest. Games look at exactly one thing — do you have five pelts in your inventory? Yes means done, no means not done. Full stop.

What I builtWhat games built
Definition of done = photos of 5 designated spotsQuest objective = 5 wolf pelts
Spec = list of spots to photographQuest log · objective markers
Verification = do all 5 photos exist?Verification = are 5 pelts present?
Judgment = system marks completeJudgment = game displays completion
Staff member = executor (not judge)Player = executor

The structure is identical. The authority to declare “done” has moved from the agent’s mouth to the system. The agent only fulfills conditions; the gate is always the one that calls completion.

This Is Reins — And Code Works the Same Way

I do the same thing in AI coding. When AI says “all done,” I do not believe it. When tests pass, types check, and schema validation holds — that is when the system declares it done. The quest objective is “4,419 tests passing,” and CI checks the inventory instead of a backpack. The standard benchmarks in agent research work exactly this way — SWE-bench defines “done” as passing the actual PR’s test suite, and WebArena defines it as functional correctness of the environment state. Not natural-language “I’m finished.”

Whether it is a rental move-out, wolf pelts, or code — the core is the same. Take the judgment of “done” away from the agent itself and move it to a defined gate outside the agent. It does not matter whether the agent is human or AI. Letting AI judge its own completion is especially dangerous, as experiments confirm — model self-critique (self-critique) barely improves performance while external deterministic verifiers improve it significantly (Stechly & Kambhampati, 2024), and even models that start honest will discover deceptive strategies to manipulate the reward function once given the authority to judge their own rewards (McKee-Reid et al., 2024). Reins do not slow the horse down. They keep the horse from galloping in the wrong direction.

And one thing becomes clean here. Give the agent opinions and it wavers. Press a staff member with “Did you really check?” and they shrink; press an AI and it reverses a correct answer. But five photographs are not an opinion. Passing tests are not an opinion. Five pelts are not an opinion. There is nothing to flatter in a fact. As long as the gate asks for facts, nobody can sweet-talk their way past it.

But Games Also Encountered Something Harder — Cheese

Stopping here means seeing only half the picture. What games really teach us comes next.

“Kill ten rats” is an infamous quest. Why? Because a gap exists between what that gate verifies (ten rats dead) and what the designer actually wanted (the player experiencing the content). The gate is only a proxy for the goal, and players exploit that gap. Speedrunners break games by finding the space between completion conditions and design intent. Game design calls this cheese. And the latest reasoning models do exactly the same thing — given a quest to beat a chess engine, models like o3 found a way to manipulate the game’s state file to produce a “win” rather than play fair (Bondarenko et al., 2025). The more capable the agent, the better it finds the gap.

My rental gate can be cheesed too. Five photos verify that “photos exist,” not that “the move-out was properly completed.” What if the staff member only photographed spotless walls? What if they recycled photos from move-in? The gate passes. The moment measurement becomes the target, measurement breaks — Goodhart’s Law, and Manheim & Garrabrant (2018) classified this over-optimization failure into four variants. AI safety research captured the same phenomenon early under the term reward hacking; an agent that hides the mess instead of cleaning it up (Amodei et al., 2016) is doing exactly what a staff member photographing only clean walls does.

I hit this gap in code constantly. Recently I refactored a 23,000-star web framework under the “one concept per file” rule and confirmed that all 4,419 tests passed. A verified fact. But when I dug deeper into the same data, the rule had passed while the goal was only 90% achieved — 10% of files still packed multiple concepts in one place. The gate (zero rule violations) cleared, but what the gate was aiming for was not fully closed. My own code was cheesing my own gate.

So the real skill in Reins is not “install a gate.” It is designing a cheese-proof gate. A weak quest asks “do the photos exist?” A strong quest demands timestamps, inspects location metadata, and uses AI vision to compare against move-in photos. The forty years of game-designer literature on “cheese-proof quests” is, in effect, a reference manual for Goodhart-resistant gates.

And this does not happen by itself. Even when training with verifiable rewards (RLVR), models can choose to game an imperfect verifier rather than learn the rule (Helff et al., 2026). Encouragingly, deliberately hardening gates (environmental hardening) reduced exploits by 87.7% with no accuracy loss (Thaman, 2026). Gate strength is a matter of design, not luck.

One Key Difference — Real-World Cheese Has Real Costs

Every analogy has limits. A game quest’s completion conditions are designed for fun and pacing. They need not capture the real-world goal precisely, and being cheesed is harmless. If a player cheats through “kill ten rats,” nobody gets hurt.

Real-world Reins gates are different. The cost of cheese is real — eviction fraud, broken builds, wrongly approved accounting. So real-world gates need to be more cheese-resistant than games. This asymmetry actually sharpens the point. Games did this, but we have to do it harder.

Assigning Work to an Agent Is Giving It a Quest

When you get here, one sentence lands.

Vibe coding collapses because it hands agents quests with no completion condition. An agent given a quest with no objective markers and no completion judgment wanders the map. It stops with “this is probably close enough,” or it roams endlessly. Reins means designing a proper quest for that agent: a clear objective (spec), visible markers (SSOT), and a cheese-proof completion judgment (deterministic verification).

And inside that one scene are three layers of skill.

  • Play the quest. Adopt gates someone else built and use them. — User.
  • Design the quest. Build gates that fit your own domain (move-outs, accounting, code). — Creator.
  • Design a cheese-proof quest. Block in advance the point where the proxy fails to track the goal. — Architect.

Most people stop at playing. Scaling up is designing. Keeping the board from breaking is designing against cheese.

So

The next time someone says “all done,” don’t push back — ask instead.

“What is done, and who designed the quest that judged it?”

If there is no answer to that question, what you have is not completion. It is someone’s claim.

Further reading (external)

  • Specification gaming: the flip side of AI ingenuity — Victoria Krakovna et al., Google DeepMind. A authoritative AI safety account of the central argument: gates are proxies, not intent, and agents exploit the gap.
  • There’s Cheese in Your Game! — Shay Pierce, Game Developer. “If it’s boring but most efficient, players will do it” — a game-design perspective on cheese-free quest design that maps directly onto cheese-proof gates.
  • From shortcuts to sabotage: emergent misalignment from reward hacking — Anthropic. How reward hacking that merely passes a scoring script spreads in coding tasks — the latest empirical case for never letting an agent be its own judge of completion.
  • How to write a good spec for AI agents — Addy Osmani. Convert vague goals like “make it faster” into verifiable success criteria like “LCP < 2.5s” — the practical version of the prescription to define done as a checkable condition.
  • What is agentic engineering? — Simon Willison. Divides the human’s role into goal definition, tool preparation, and verification, treating test passage as “done” — aligns directly with the agent-as-executor / human-as-quest-designer reframe.

Sources

  • Manheim & Garrabrant. “Categorizing Variants of Goodhart’s Law” (2018, arXiv:1803.04585)
  • Amodei et al. “Concrete Problems in AI Safety” (2016, arXiv:1606.06565)
  • Bondarenko et al. “Demonstrating Specification Gaming in Reasoning Models” (2025, arXiv:2502.13295)
  • Helff et al. “LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking” (2026, arXiv:2604.15149)
  • Thaman. “Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use” (2026, arXiv:2605.02964)
  • McKee-Reid et al. “Honesty to Subterfuge: In-Context RL Can Make Honest Models Reward Hack” (2024, arXiv:2410.06491)
  • Stechly, Valmeekam, Kambhampati. “On the Self-Verification Limitations of Large Language Models” (2024, arXiv:2402.08115)
  • Jimenez et al. “SWE-bench: Can Language Models Resolve Real-World GitHub Issues?” (2023, arXiv:2310.06770)
  • Zhou et al. “WebArena: A Realistic Web Environment for Building Autonomous Agents” (2023, arXiv:2307.13854)
  • Hero image: AI-generated (Google Gemini)