Image: AI generated
The Agent That Stops at Endpoint 15
You tell an AI agent to write Hurl tests for a SaaS backend. There are 42 endpoints.
The first 10 go smoothly. Login, signup, list – similar patterns, so it is fast. Around 15, the agent says:
“The remaining endpoints follow a similar pattern, so I have completed the task.”
Endpoints with tests: 15. Remaining 27: none.
The agent is not lazy. Cemri et al. analyzed 7 multi-agent frameworks and found that 41% to 86.7% of tasks failed, with “premature termination” identified as one of 14 failure modes (Cemri et al., NeurIPS 2025). LLMs are optimized for generation, not progress tracking. They cannot remember which endpoints have tests and which don’t. “Similar pattern” becomes justification for skipping the remaining 27.
Does a bigger model fix it? Maybe it gets to 20. It still won’t reach 42. The problem is not model size but the structure where the agent itself judges completion.
Why a Ratchet Is Needed
The core of the Ratchet Pattern is simple. Take completion judgment away from the agent.
A mechanical verifier determines “this is not done yet” and “this is done.” The agent works until the verifier passes it, then moves to the next. No going backward. What passed once stays passed forever.
huma applies this pattern to Hurl API testing. Given an endpoint list, it creates a ratchet session and forces the agent to write and pass tests one by one. If there are 42, all 42. Kim et al. applied 10 SOTA tools to 20 real services and found all achieved low coverage, concluding “resolving inter-endpoint dependencies is key to effective API testing” (Kim et al., ISSTA 2022). huma’s sequential ratchet structurally solves this.
Hurl Master – the name says it all.
Commands
huma has six commands.
| Command | Role |
|---|---|
huma scan | Auto-detects openapi.yaml and scans endpoints |
huma scan --from <file> | Scans from a specified file (OpenAPI, JSON array, YAML) |
huma next | Shows the next incomplete endpoint, or verifies and advances |
huma verify | Runs the current endpoint’s Hurl test; advances on pass |
huma status | Shows progress (TODO/PASS/IMPROVE/DONE counts) |
huma prompt | Outputs an agent prompt for the current TODO (no side effects) |
No config file. No DSL. “First test in 10 minutes” is the design goal.
The Ratchet Loop
The full workflow has three stages.
openapi.yaml ──► huma scan ──► session
│
huma next ◄──┐
│ │
┌───────┴───────┐ │
│ TODO │ │
│ (no .hurl) │ │
└───────┬───────┘ │
agent writes │
.hurl │
│ │
┌───────┴───────┐ │
│ PASS/IMPROVE │──┘
└───────────────┘
1. Endpoint Scan
huma scan
# Auto-detected openapi.yaml
# Scanned 42 endpoints
2. Check Next TODO
huma next
# TODO GET /api/v1/users
3. Write Test
The agent writes a Hurl file. Golden path + at least one error case (400/401/404).
4. Verify and Advance
huma next
# PASS GET /api/v1/users → hurl/get_api_v1_users.hurl
# TODO POST /api/v1/auth/login
Pass locks, next TODO appears. Fail stays on the same endpoint. The agent cannot declare “done.” As long as huma next finds TODOs, the loop continues.
huma status
# 15/42 PASS | 0 IMPROVE | 27 TODO
Ratchet States
TODO → PASS → (complete)
TODO → IMPROVE → PASS (or DONE)
| State | Meaning |
|---|---|
| TODO | No .hurl file. Agent gets handler source + expected responses + Hurl example |
| IMPROVE | Hurl exists but response status codes are missing. Shows which codes are missing |
| PASS | All expected response status codes are covered |
| DONE | Coverage stalled after retries. Accepted at current level |
Direction is one-way only. TODO to PASS or DONE. No backward transitions. This is the ratchet.
Two Modes
Static Mode (no server)
No testing.server block in manifest.yaml. Checks Hurl files against expected status codes from OpenAPI or source analysis without running a server.
Live Mode (server running)
Add testing.server block. huma builds the binary, starts the server, waits for health check, runs Hurl. Full runtime verification ratchet.
Language Agnostic
| Language | Adapter | Analyzer | backend.lang |
|---|---|---|---|
| Go | GoAdapter | go/ast | go (default) |
| Python | PythonAdapter | regex | python |
| Node.js | NodeAdapter | regex | node |
Hurl tests are HTTP-level, so language-independent. Same ratchet loop works on Django, Express, or Gin.
Agent Native
huma next outputs everything an agent needs: handler source, expected file path, instructions. Ryan et al. demonstrated that providing execution paths as prompts to LLMs doubles test coverage (Ryan et al., FSE 2024). huma next does exactly this.
huma prompt is even more explicit – outputs an agent prompt for the current TODO with no side effects.
One instruction for the agent:
Run
huma nextand repeat until TODO is 0.
Errors Have Rule IDs
[H-01] Hurl file not found at expected path
▶ Check `huma next` output for expected filename
[A-02] Server build command failed
▶ go build -cover: exit status 1
| Prefix | Domain |
|---|---|
M- | manifest.yaml validation |
E- | Endpoint input validation |
H- | Hurl file validation |
S- | Session state validation |
A- | Adapter/server validation |
This is the Reins Engineering principle. Feedback must be mechanical, structural, and repeatable.
juicer, huma, yongol Pipeline
Legacy codebase
│
▼
juicer ──► openapi.yaml (API spec extraction)
│
▼
huma ──► hurl/*.hurl (test generation)
│
▼
yongol ──► refactored code (SSOT-based rebuild)
juicer extracts OpenAPI specs from legacy code. huma generates Hurl tests for every endpoint. yongol rebuilds on SSOT. The manifest.yaml shared between huma and yongol means zero transition cost.
huma works independently of yongol too. Any SaaS backend with openapi.yaml can run the ratchet. Go, Python, or Node.js.
10 Minutes Is Enough
go install github.com/park-jun-woo/huma@latest
Or install as a Claude Code skill:
npx skills add park-jun-woo/huma
huma scanhuma next– read prompt, write Hurlhuma next– verify, advance- Repeat until TODO is 0
No config. No learning curve. 42 endpoints means 42 tests.
Don’t give the agent permission to declare “done.” Let the machine decide. That is the ratchet.
Code: github.com/park-jun-woo/huma
References
- Jimenez, C. E. et al. (2024). SWE-bench. ICLR 2024. arxiv.org/abs/2310.06770
- Xu, F. et al. (2024). TheAgentCompany. arxiv.org/abs/2412.14161
- Lu, Q. et al. (2025). Runaway is Ashamed, But Helpful. arxiv.org/abs/2505.17616
- Golmohammadi, A. et al. (2024). Testing RESTful APIs: A Survey. TOSEM. dl.acm.org/doi/10.1145/3617175
- Martin-Lopez, A. et al. (2019). Test Coverage Criteria for RESTful Web APIs. dl.acm.org/doi/10.1145/3340433.3342822
- Atlidakis, V. et al. (2019). RESTler: Stateful REST API Fuzzing. ICSE 2019. dl.acm.org/doi/10.1109/ICSE.2019.00083
- Arcuri, A. (2019). RESTful API Automated Test Case Generation with EvoMaster. TOSEM. dl.acm.org/doi/10.1145/3293455
Related Posts
- Ratchet Pattern – How to Make Agents Finish the Job – huma’s theoretical background. Verifier-based unidirectional execution.
- yongol – The Keel of AI Coding SaaS – The upper framework of huma’s test ratchet. 10 SSOT consistency verification.
- Hurl Stops Vibe Coding Drift – Strategy of declaring API contracts with Hurl and locking with a ratchet.
- Ratchet Code That Exploits IFEval – Feedback loops exploiting sycophancy bias and Reins.
- Feedback Topology over Model IQ – Why feedback structure determines outcomes more than model performance.
- Reins Engineering – AI with Reins – The Reins tool ecosystem huma belongs to.