Class 9

Quick Tips — Just Know This and You Can Command AI

Having agents handle only code isn’t the end. To delegate builds, deployments, and monitoring to agents, the entire system must be readable by agents. You don’t need to understand Docker internals. The agent handles everything.

To the agent: “Add a /health endpoint to the server. Return DB connection status, error rate, and uptime as JSON.”

This one phrase gives the agent eyes to read system state. With /health, the agent can mechanically verify “is the server alive?” Without it, it’s a surgeon operating blind.

To the agent: “Configure this project as docker-compose.yml. Include app server and DB. Everything should come up with docker compose up.”

You don’t need to know what Docker is. Just knowing it’s a tool that puts apps in a box so they run identically anywhere is enough. Agent handles installation through configuration.

To the agent: “Set up automatic rollback on deploy failure. If /health fails, revert to previous version.”

Agents will inevitably make mistakes. Mistakes must be reversible. This one phrase is the safety net.

Three phrases. Give the server eyes, declare the system, lay a safety net. The agent does the rest.


Hands-on Try

Open any project (or Class 1 app) with Claude Code:

“Add a /health endpoint to the server. Return DB connection status, error rate, and uptime as JSON. All existing Hurl tests must pass.”

After the agent adds code:

“Create a Hurl test that verifies /health returns 200. Also check the JSON response has db, status, and uptime fields.”

This is the start of Observability. The agent can now mechanically read system state.


Why You Need to Command This Way

Introduction: Beyond the Codebase

In Class 8 we made code agent-readable and writable. Split files with filefunc, secured tests with tsma, tracked change history with whyso.

But is agent-operable code enough?

After modifying code, you need to build. After building, deploy. After deploying, monitor. If something fails, rollback. If any of these steps requires manual human action, the agent’s autonomous scope ends at “code editing.”

Think of a vibe coder’s reality. “Add feature” produces code. Then what? Run build commands in terminal, enter AWS console to deploy, skim logs by eye, and if problems arise, say “revert” again.

This entire manual process should be connected as one pipeline. Agent edits code, runs tests, builds, deploys, monitors — humans just press the approval button.

This is Agent Operable System.


From Codebase to System

Class 8: Agent Operable CodebaseClass 9: Agent Operable System
Can read codeCan read system state
Can modify codeCan change system config
Verified by testsVerified by monitoring
Persists at file levelInfrastructure state persists

The codebase is one part of the system. The system is the total of code + infrastructure + deployment pipeline + monitoring + operational procedures.

Feature request
→ SSOT editing (yongol)
→ Code generation (yongol generate)
→ Test pass (Hurl + go test)
→ Build (Docker)
→ Deploy (CI/CD)
→ Monitor (health check + logs)
→ Complete

If any link in this chain is opaque to the agent, everything after becomes the human’s responsibility. One broken link topples entire automation.


4 Conditions for Agent Operable System

For a system to be operable by agents, four conditions must be met.


Condition 1. Observability — All State Mechanically Observable

Agents have no eyes. They can’t see screens. They can’t read dashboards. For an agent to know system state, that state must output as text.

# Human observation
Log into AWS console → CloudWatch dashboard → Check graphs by eye
→ "Oh, CPU is high" → Judgment

# Agent observation
$ curl -s localhost:8080/health | jq .
{
  "status": "ok",
  "db": "connected",
  "uptime": "3h42m",
  "error_rate_5m": 0.02
}
→ error_rate > 0.05? → Alert

Observability’s core: not what humans see, but what machines can parse.

A system without observability is a surgeon operating blind.


Condition 2. Declarative — All Actions Defined Declaratively

When you tell an agent “deploy,” what does it do?

Without a declarative system: The agent guesses “usually you do it this way.” SSH into server, git pull, restart process… and misses something.

With a declarative system: Everything is written in files.

# docker-compose.yml — What services run
services:
  app:
    build: .
    ports: ["8080:8080"]
    environment:
      DATABASE_URL: ${DATABASE_URL}

# Makefile — What commands do what
deploy:
    docker compose up -d
    curl -sf localhost:8080/health || (docker compose logs && exit 1)

In a declarative system, what the agent does is clear:

  1. Read files (docker-compose.yml, Makefile, workflow)
  2. Execute as files say
  3. Check results

No guessing. Files are truth.

The SSOT principle from Class 4 applies identically here. Just as we separated decisions from implementation in code, in systems too we separate “what to do” (declaration) from “how to do it” (execution).

Docker is a tool that puts apps in a box so they run identically anywhere. Like packing belongings in boxes when moving — put the app and everything it needs in one box. Move the box and it runs the same anywhere. Terraform is a tool for managing servers as code files.

You don’t need to understand Docker and Terraform internals. “Putting apps in a box” — that one line is enough. The agent handles the rest.


Condition 3. Reversible — All Changes Verifiable and Reversible

If the agent deployed and the service died, two things are needed:

  1. Can tell what went wrong (verifiable)
  2. Can revert to previous state (reversible)
# Irreversible deploy (terror)
Directly upload and overwrite files on server.
→ Problem → Where's the previous version? → Can't remember → Panic

# Reversible deploy (peace)
git revert HEAD && make deploy
→ Problem → Rollback to previous commit → Recovered in 1 minute

Git’s core from Class 3 returns here. Code rollback is handled by Git. Infrastructure rollback by Terraform. DB rollback by migration down files.

Irreversible changes cannot be delegated to agents. Agents will make mistakes — and they inevitably do — so mistakes must be reversible.


Condition 4. Human-in-the-loop — Approval Gates Are Explicit

The most important of the four conditions.

Structure where agent judges and human approves. Not “human instructs, agent executes” but “agent proposes, human approves.” Direction is reversed.

The key is approval gates are explicit and can’t be automatically bypassed.

TaskAuto-executeNeeds approval
Run testsO
Code formattingO
Staging deployO
Production deployO
DB schema changeO
Env variable changeO
Rollback (pre-approved)O

Reversible tasks can auto-execute. Hard-to-reverse or high-impact tasks must go through approval. Declaring this boundary in advance is Human-in-the-loop design.


The Agent’s Bottleneck Is Context, Not Intelligence

In Class 8, filefunc removed code’s context pollution. This principle extends to the entire system.

Structuring code lets the same agent handle 10x wider scope.

Not just code. Structuring every layer of the system dramatically widens the agent’s exploration scope:

Code    → Structured with filefunc
Config  → Declaratively defined with docker-compose.yml, Makefile
Specs   → Cross-validated with yongol SSOT
Infra   → Persisted with Terraform state
Monitor → Machine-readable with /health + structured logs

The agent’s bottleneck isn’t intelligence. Giving agents structured information is 10x more effective than using smarter models. Just as filefunc structured code in Class 8, Class 9 structures the entire system.


The Complete Pipeline: From “Add Feature” to Deployment

In a project with Agent Operable System, saying “add an order history query feature”:

1. SSOT editing
   Agent: Add ListOrders to features.yaml
   Agent: Define GET /orders in OpenAPI
   Agent: Define orders table in DDL
   Agent: Declare service flow in SSaC
   Agent: Write test scenario in Hurl

2. Consistency validation
   Agent: yongol validate → 0 errors

3. Code generation
   Agent: yongol generate → Go handler, sqlc queries, React component

4. Test pass
   Agent: go test → PASS
   Agent: Hurl tests → PASS

5. Build
   Agent: docker build → Success

6. Deploy (approval gate)
   Agent: "All validations passed. Requesting staging deploy approval."
   Human: "Approved"
   Agent: staging deploy → /health check → Normal

7. Production deploy (approval gate)
   Agent: "30 minutes error-free on staging. Requesting production deploy approval."
   Human: "Approved"
   Agent: production deploy → /health check → Normal

8. Complete
   Agent: "ListOrders feature deployed.
   Monitoring. Auto-rollback on anomaly."

What the human did: “Add order history query feature” + “Approved” twice. What the agent did: Everything else.

This is the complete form of vibe coding scale-up.


Vision — The End of Vibe Coding Scale-up

Where we started in Class 1:

Class 1:
  "Add feature" → Code emerges → 5 features then crumbles

Class 9:
  "Add feature" + "Approve"
  → Code generation → Test pass → Build → Deploy → Monitor
  → Entire pipeline is agent-driven

A structure where non-SWEs can maintain, deploy, and operate 100+ endpoints.

Scale-up possible without SWEs because: decisions by humans, implementation and verification by machines.

But one thing is still missing. We structured code, structured the system. But what about data?

Class 10 completes the final puzzle.


Exercise


Required Exercise (Non-technical)

Goal: Add a /health endpoint and verify with Hurl.

Step 1 — Establish Observability

To the agent: "Add a /health endpoint to the server.
Return DB connection status, error rate, and uptime as JSON.
All existing Hurl tests must pass."

Step 2 — Write Hurl Test

To the agent: "Create a Hurl test that verifies /health returns 200.
Also check the JSON response has db, status, and uptime fields."

What to check:

  • Does /health return 200?
  • Does the Hurl test pass?

Challenge Exercise (Optional)

No need to install Docker yourself or understand config files. Tell the agent everything.

Step 1 — Docker Compose

To the agent: "If Docker isn't installed, install it.
Configure this project as docker-compose.yml.
Include app server and PostgreSQL.
Everything should come up with docker compose up.
Add build, deploy, test commands to the Makefile."

The agent handles everything from Docker installation to config file creation to execution verification. You just check at the end — “does the app start? Does /health return 200?”

Step 2 — CI/CD Pipeline

To the agent: "Create a GitHub Actions workflow.
On push to main branch:
1. Run go test
2. Run Hurl tests
3. Docker build
4. Deploy to staging if all pass
But production deploy requires manual approval."

Step 3 — Integration Demo

Request one feature addition from the agent. Demo the full pipeline from SSOT editing to staging deploy, with only production deploy requiring manual approval.

What to check:

  • From “add feature” to staging deploy, how many times did a human intervene?
  • When the agent failed, how many minutes to rollback?
  • Could the agent read /health results and assess system state?


Reins Engineering Full Course

ClassTitle
Class 1How to Command AI
Class 2How to Distrust AI
Class 3Unbreakable Apps
Class 4Decisions Outside Code
Class 5AI with Reins
Class 6Lock When It Passes
Class 7Flipping Sycophancy
Class 8Agent Factory
Class 9Automation Beyond Code
Class 10Law of Data

Sources

  • Class 8 reference: Stanford “Lost in the Middle” (2024), Amazon “Context Length Alone Hurts LLM Performance” (2025) — Unnecessary context degrades agent performance 30-85%
  • Observability principles — Machine-parseable structured output (/health endpoints, JSON logs) as prerequisite for agent operation
  • Docker Compose — Declarative service configuration for agents to read and execute systems without guessing
  • Terraform — Infrastructure as Code, declarative definition and reversible changes of infrastructure state
  • CI/CD (GitHub Actions) — Declarative automation of build-test-deploy pipelines
  • Human-in-the-loop design — Auto for reversible tasks, approval gate required for high-impact tasks