Class 9. Automation Beyond Code — Agent Operable System

Class 9 Image: AI generated

Quick Tips — Just Know This and You Can Command AI

Having agents handle only code isn’t the end. To delegate builds, deployments, and monitoring to agents, the entire system must be readable by agents. You don’t need to understand Docker internals. The agent handles everything.

To the agent: “Add a /health endpoint to the server. Return DB connection status, error rate, and uptime as JSON.”

This one phrase gives the agent eyes to read system state. With /health, the agent can mechanically verify “is the server alive?” Without it, it’s a surgeon operating blind.

To the agent: “Configure this project as docker-compose.yml. Include app server and DB. Everything should come up with docker compose up.”

You don’t need to know what Docker is. Just knowing it’s a tool that puts apps in a box so they run identically anywhere is enough. Agent handles installation through configuration.

To the agent: “Set up automatic rollback on deploy failure. If /health fails, revert to previous version.”

Agents will inevitably make mistakes. Mistakes must be reversible. This one phrase is the safety net.

Three phrases. Give the server eyes, declare the system, lay a safety net. The agent does the rest.

Hands-on Try

Open any project (or Class 1 app) with Claude Code:

“Add a /health endpoint to the server. Return DB connection status, error rate, and uptime as JSON. All existing Hurl tests must pass.”

After the agent adds code:

“Create a Hurl test that verifies /health returns 200. Also check the JSON response has db, status, and uptime fields.”

This is the start of Observability. The agent can now mechanically read system state.

Why You Need to Command This Way

Introduction: Beyond the Codebase

In Class 8 we made code agent-readable and writable. Split files with filefunc, secured tests with tsma, tracked change history with whyso.

But is agent-operable code enough?

After modifying code, you need to build. After building, deploy. After deploying, monitor. If something fails, rollback. If any of these steps requires manual human action, the agent’s autonomous scope ends at “code editing.”

Think of a vibe coder’s reality. “Add feature” produces code. Then what? Run build commands in terminal, enter AWS console to deploy, skim logs by eye, and if problems arise, say “revert” again.

This entire manual process should be connected as one pipeline. Agent edits code, runs tests, builds, deploys, monitors — humans just press the approval button.

This is Agent Operable System.

From Codebase to System

Class 8: Agent Operable Codebase	Class 9: Agent Operable System
Can read code	Can read system state
Can modify code	Can change system config
Verified by tests	Verified by monitoring
Persists at file level	Infrastructure state persists

The codebase is one part of the system. The system is the total of code + infrastructure + deployment pipeline + monitoring + operational procedures.

Feature request
→ SSOT editing (yongol)
→ Code generation (yongol generate)
→ Test pass (Hurl + go test)
→ Build (Docker)
→ Deploy (CI/CD)
→ Monitor (health check + logs)
→ Complete

If any link in this chain is opaque to the agent, everything after becomes the human’s responsibility. One broken link topples entire automation.

4 Conditions for Agent Operable System

For a system to be operable by agents, four conditions must be met.

Condition 1. Observability — All State Mechanically Observable

Agents have no eyes. They can’t see screens. They can’t read dashboards. For an agent to know system state, that state must output as text.

# Human observation
Log into AWS console → CloudWatch dashboard → Check graphs by eye
→ "Oh, CPU is high" → Judgment

# Agent observation
$ curl -s localhost:8080/health | jq .
{
  "status": "ok",
  "db": "connected",
  "uptime": "3h42m",
  "error_rate_5m": 0.02
}
→ error_rate > 0.05? → Alert

Observability’s core: not what humans see, but what machines can parse.

A system without observability is a surgeon operating blind.

Condition 2. Declarative — All Actions Defined Declaratively

When you tell an agent “deploy,” what does it do?

Without a declarative system: The agent guesses “usually you do it this way.” SSH into server, git pull, restart process… and misses something.

With a declarative system: Everything is written in files.

# docker-compose.yml — What services run
services:
  app:
    build: .
    ports: ["8080:8080"]
    environment:
      DATABASE_URL: ${DATABASE_URL}

# Makefile — What commands do what
deploy:
    docker compose up -d
    curl -sf localhost:8080/health || (docker compose logs && exit 1)

In a declarative system, what the agent does is clear:

Read files (docker-compose.yml, Makefile, workflow)
Execute as files say
Check results

No guessing. Files are truth.

The SSOT principle from Class 4 applies identically here. Just as we separated decisions from implementation in code, in systems too we separate “what to do” (declaration) from “how to do it” (execution).

Docker is a tool that puts apps in a box so they run identically anywhere. Like packing belongings in boxes when moving — put the app and everything it needs in one box. Move the box and it runs the same anywhere. Terraform is a tool for managing servers as code files.

You don’t need to understand Docker and Terraform internals. “Putting apps in a box” — that one line is enough. The agent handles the rest.

Condition 3. Reversible — All Changes Verifiable and Reversible

If the agent deployed and the service died, two things are needed:

Can tell what went wrong (verifiable)
Can revert to previous state (reversible)

# Irreversible deploy (terror)
Directly upload and overwrite files on server.
→ Problem → Where's the previous version? → Can't remember → Panic

# Reversible deploy (peace)
git revert HEAD && make deploy
→ Problem → Rollback to previous commit → Recovered in 1 minute

Git’s core from Class 3 returns here. Code rollback is handled by Git. Infrastructure rollback by Terraform. DB rollback by migration down files.

Irreversible changes cannot be delegated to agents. Agents will make mistakes — and they inevitably do — so mistakes must be reversible.

Condition 4. Human-in-the-loop — Approval Gates Are Explicit

The most important of the four conditions.

Structure where agent judges and human approves. Not “human instructs, agent executes” but “agent proposes, human approves.” Direction is reversed.

The key is approval gates are explicit and can’t be automatically bypassed.

Task	Auto-execute	Needs approval
Run tests	O
Code formatting	O
Staging deploy	O
Production deploy		O
DB schema change		O
Env variable change		O
Rollback (pre-approved)	O

Reversible tasks can auto-execute. Hard-to-reverse or high-impact tasks must go through approval. Declaring this boundary in advance is Human-in-the-loop design.

The Agent’s Bottleneck Is Context, Not Intelligence

In Class 8, filefunc removed code’s context pollution. This principle extends to the entire system.

Structuring code lets the same agent handle 10x wider scope.

Not just code. Structuring every layer of the system dramatically widens the agent’s exploration scope:

Code    → Structured with filefunc
Config  → Declaratively defined with docker-compose.yml, Makefile
Specs   → Cross-validated with yongol SSOT
Infra   → Persisted with Terraform state
Monitor → Machine-readable with /health + structured logs

The agent’s bottleneck isn’t intelligence. Giving agents structured information is 10x more effective than using smarter models. Just as filefunc structured code in Class 8, Class 9 structures the entire system.

The Complete Pipeline: From “Add Feature” to Deployment

In a project with Agent Operable System, saying “add an order history query feature”:

1. SSOT editing
   Agent: Add ListOrders to features.yaml
   Agent: Define GET /orders in OpenAPI
   Agent: Define orders table in DDL
   Agent: Declare service flow in SSaC
   Agent: Write test scenario in Hurl

2. Consistency validation
   Agent: yongol validate → 0 errors

3. Code generation
   Agent: yongol generate → Go handler, sqlc queries, React component

4. Test pass
   Agent: go test → PASS
   Agent: Hurl tests → PASS

5. Build
   Agent: docker build → Success

6. Deploy (approval gate)
   Agent: "All validations passed. Requesting staging deploy approval."
   Human: "Approved"
   Agent: staging deploy → /health check → Normal

7. Production deploy (approval gate)
   Agent: "30 minutes error-free on staging. Requesting production deploy approval."
   Human: "Approved"
   Agent: production deploy → /health check → Normal

8. Complete
   Agent: "ListOrders feature deployed.
   Monitoring. Auto-rollback on anomaly."

What the human did: “Add order history query feature” + “Approved” twice. What the agent did: Everything else.

This is the complete form of vibe coding scale-up.

Vision — The End of Vibe Coding Scale-up

Where we started in Class 1:

Class 1:
  "Add feature" → Code emerges → 5 features then crumbles

Class 9:
  "Add feature" + "Approve"
  → Code generation → Test pass → Build → Deploy → Monitor
  → Entire pipeline is agent-driven

80/20 — What Humans Actually Need to Focus On

When an Agent Operable System is complete, 80-90% of the code is auto-generated from specifications. You only need to focus on the remaining 10-20%.

What is that 10-20%? Business rules (pricing policies, workflows), domain logic (legal/policy calculations), external API integrations. Just structure these with filefunc and secure tests with tsma.

This is the structure that allows non-SWEs to maintain 100+ endpoints.

Decisions are made by humans. Implementation and verification are done by machines. Humans only need to decide “what to build.” “How to build it” is defined by specifications and executed by agents.

Why Big Tech Does Not Build This

“If it’s so great, why don’t Anthropic or OpenAI make it?”

Models are general-purpose, but verification tools must be specialized for each language and framework. Go testing tools only apply to Go; Python tools only to Python. This does not fit big tech’s ROI. That is why this space is empty.

The faster the train (model) gets, the more important the tracks (verification tools) become. These are the tracks you, taking this course, can lay.

Full Pipeline: From “Add Feature” to Deployment

In a project with an Agent Operable System, saying “add an order history query feature”:

1. SSOT editing
   Agent: adds ListOrders to features.yaml
   Agent: defines GET /orders in OpenAPI
   Agent: defines orders table in DDL
   Agent: declares service flow in SSaC
   Agent: writes test scenario in Hurl

2. Consistency verification
   Agent: yongol validate → 0 errors

3. Code generation
   Agent: yongol generate → Go handler, sqlc queries, React component

4. Test pass
   Agent: go test → PASS
   Agent: Hurl test → PASS

5. Build
   Agent: docker build → success

6. Deployment (approval gate)
   Agent: "All verification passed. Requesting staging deploy approval."
   Human: "Approved"
   Agent: staging deploy → /health check → normal

7. Production deployment (approval gate)
   Agent: "No errors for 30 minutes in staging. Requesting production deploy approval."
   Human: "Approved"
   Agent: production deploy → /health check → normal

8. Complete
   Agent: "ListOrders feature deployed.
   Monitoring. Auto-rollback on any anomaly."

What the human did: “add order history query feature” + “Approved” twice. What the agent did: everything else.

This is the complete form of vibe coding scale-up.

From Legacy to Agent Operable System

“Our project already exists. How do we transition?”

No need to change everything at once. You can go step by step.

Step 1 — Establish Observability (1 day)

Add /health endpoint
Convert logs to structured JSON
Set up basic monitoring

Step 2 — Transition to Declarative System (2-3 days)

Define service configuration with Docker Compose
Unify build/deploy/test commands in Makefile
Build CI/CD pipeline

Step 3 — Establish Reversibility (1-2 days)

Introduce DB migration system
Document and automate rollback procedures
Health check + auto-rollback on deployment

Step 4 — Define Approval Gates (1 day)

List tasks that can auto-execute
List tasks requiring approval
Add approval steps to CI/CD

Each step is independent. Step 1 alone lets agents understand system state. Through step 2, agents can execute deployments. Through step 3, mistakes can be reversed. Through step 4, safe autonomous operation is possible.

Vision — The End of Vibe Coding Scale-Up

Where have we come since “build me a to-do list app” in Class 1?

Class 1: "Build a to-do list app"
  → Code comes out. Works for 3 features.

Class 8: "Run filefunc validate and get violations to 0"
  → Codebase becomes agent-friendly.

Class 9: "Add feature" + "Approve"
  → Code generation → test pass → build → deploy → monitor.
  → The entire pipeline runs agent-driven.

A structure where non-SWEs can maintain, deploy, and operate 100+ endpoints.

Scale-up possible without SWEs because: decisions by humans, implementation and verification by machines.

But one thing is still missing. We structured code, structured the system. But what about data?

Class 10 completes the final puzzle.

Exercise

Required Exercise (Non-technical)

Goal: Add a /health endpoint and verify with Hurl.

Step 1 — Establish Observability

To the agent: "Add a /health endpoint to the server.
Return DB connection status, error rate, and uptime as JSON.
All existing Hurl tests must pass."

Step 2 — Write Hurl Test

To the agent: "Create a Hurl test that verifies /health returns 200.
Also check the JSON response has db, status, and uptime fields."

What to check:

Does /health return 200?
Does the Hurl test pass?

Challenge Exercise (Optional)

No need to install Docker yourself or understand config files. Tell the agent everything.

Step 1 — Docker Compose

To the agent: "If Docker isn't installed, install it.
Configure this project as docker-compose.yml.
Include app server and PostgreSQL.
Everything should come up with docker compose up.
Add build, deploy, test commands to the Makefile."

The agent handles everything from Docker installation to config file creation to execution verification. You just check at the end — “does the app start? Does /health return 200?”

Step 2 — CI/CD Pipeline

To the agent: "Create a GitHub Actions workflow.
On push to main branch:
1. Run go test
2. Run Hurl tests
3. Docker build
4. Deploy to staging if all pass
But production deploy requires manual approval."

Step 3 — Integration Demo

Request one feature addition from the agent. Demo the full pipeline from SSOT editing to staging deploy, with only production deploy requiring manual approval.

What to check:

From “add feature” to staging deploy, how many times did a human intervene?
When the agent failed, how many minutes to rollback?
Could the agent read /health results and assess system state?

Reins Engineering Full Course

Class	Title
Class 0	Install Claude Code
Class 1	How to Command AI
Class 2	How to Distrust AI
Class 3	Apps That Don’t Break
Class 4	Decisions Outside Code
Class 5	AI with Reins
Class 6	Pass Then Lock
Class 7	Flipping Sycophancy
Class 8	The Agent’s Factory
Class 9	Automation Beyond Code
Class 10	The Law of Data
Class 11	How to Rescue Failed Vibe Coding

Sources

Class 8 reference: Stanford “Lost in the Middle” (2024), Amazon “Context Length Alone Hurts LLM Performance” (2025) — Unnecessary context degrades agent performance 30-85%
Observability principles — Machine-parseable structured output (/health endpoints, JSON logs) as prerequisite for agent operation
Docker Compose — Declarative service configuration for agents to read and execute systems without guessing
Terraform — Infrastructure as Code, declarative definition and reversible changes of infrastructure state
CI/CD (GitHub Actions) — Declarative automation of build-test-deploy pipelines
Human-in-the-loop design — Auto for reversible tasks, approval gate required for high-impact tasks