huma -- A Ratchet That Never Skips an Endpoint

huma – A Ratchet That Never Skips an Endpoint Image: AI generated

If you asked an AI agent to write API tests and it declared “done” before finishing half of them, if you want every single endpoint tested without exception, if you want to enforce Hurl test writing with a ratchet – huma is that tool.

The Agent That Stops at Endpoint 15

You tell an AI agent to write Hurl tests for a SaaS backend. There are 42 endpoints.

The first 10 go smoothly. Login, signup, list – similar patterns, so it is fast. Around 15, the agent says:

“The remaining endpoints follow a similar pattern, so I have completed the task.”

Endpoints with tests: 15. Remaining 27: none.

The agent is not lazy. Cemri et al. analyzed 7 multi-agent frameworks and found that 41% to 86.7% of tasks failed, with “premature termination” identified as one of 14 failure modes (Cemri et al., NeurIPS 2025). LLMs are optimized for generation, not progress tracking. They cannot remember which endpoints have tests and which don’t. “Similar pattern” becomes justification for skipping the remaining 27.

Does a bigger model fix it? Maybe it gets to 20. It still won’t reach 42. The problem is not model size but the structure where the agent itself judges completion.

Why a Ratchet Is Needed

The core of the Ratchet Pattern is simple. Take completion judgment away from the agent.

A mechanical verifier determines “this is not done yet” and “this is done.” The agent works until the verifier passes it, then moves to the next. No going backward. What passed once stays passed forever.

huma applies this pattern to Hurl API testing. Given an endpoint list, it creates a ratchet session and forces the agent to write and pass tests one by one. If there are 42, all 42. Kim et al. applied 10 SOTA tools to 20 real services and found all achieved low coverage, concluding “resolving inter-endpoint dependencies is key to effective API testing” (Kim et al., ISSTA 2022). huma’s sequential ratchet structurally solves this.

Hurl Master – the name says it all.

Commands

huma has six commands.

Command	Role
`huma scan`	Auto-detects `openapi.yaml` and scans endpoints
`huma scan --from <file>`	Scans from a specified file (OpenAPI, JSON array, YAML)
`huma next`	Shows the next incomplete endpoint, or verifies and advances
`huma verify`	Runs the current endpoint’s Hurl test; advances on pass
`huma status`	Shows progress (TODO/PASS/IMPROVE/DONE counts)
`huma prompt`	Outputs an agent prompt for the current TODO (no side effects)

No config file. No DSL. “First test in 10 minutes” is the design goal.

The Ratchet Loop

The full workflow has three stages.

openapi.yaml ──► huma scan ──► session
                                 │
                           huma next ◄──┐
                             │          │
                     ┌───────┴───────┐  │
                     │  TODO         │  │
                     │  (no .hurl)   │  │
                     └───────┬───────┘  │
                       agent writes    │
                       .hurl           │
                             │          │
                     ┌───────┴───────┐  │
                     │  PASS/IMPROVE │──┘
                     └───────────────┘

1. Endpoint Scan

huma scan
# Auto-detected openapi.yaml
# Scanned 42 endpoints

2. Check Next TODO

huma next
# TODO  GET /api/v1/users

3. Write Test

The agent writes a Hurl file. Golden path + at least one error case (400/401/404).

4. Verify and Advance

huma next
# PASS  GET /api/v1/users → hurl/get_api_v1_users.hurl
# TODO  POST /api/v1/auth/login

Pass locks, next TODO appears. Fail stays on the same endpoint. The agent cannot declare “done.” As long as huma next finds TODOs, the loop continues.

huma status
# 15/42 PASS  |  0 IMPROVE  |  27 TODO

Ratchet States

TODO → PASS → (complete)
TODO → IMPROVE → PASS (or DONE)

State	Meaning
TODO	No .hurl file. Agent gets handler source + expected responses + Hurl example
IMPROVE	Hurl exists but response status codes are missing. Shows which codes are missing
PASS	All expected response status codes are covered
DONE	Coverage stalled after retries. Accepted at current level

Direction is one-way only. TODO to PASS or DONE. No backward transitions. This is the ratchet.

Two Modes

Static Mode (no server)

No testing.server block in manifest.yaml. Checks Hurl files against expected status codes from OpenAPI or source analysis without running a server.

Live Mode (server running)

Add testing.server block. huma builds the binary, starts the server, waits for health check, runs Hurl. Full runtime verification ratchet.

Language Agnostic

Language	Adapter	Analyzer	`backend.lang`
Go	`GoAdapter`	go/ast	`go` (default)
Python	`PythonAdapter`	regex	`python`
Node.js	`NodeAdapter`	regex	`node`

Hurl tests are HTTP-level, so language-independent. Same ratchet loop works on Django, Express, or Gin.

Agent Native

huma next outputs everything an agent needs: handler source, expected file path, instructions. Ryan et al. demonstrated that providing execution paths as prompts to LLMs doubles test coverage (Ryan et al., FSE 2024). huma next does exactly this.

huma prompt is even more explicit – outputs an agent prompt for the current TODO with no side effects.

One instruction for the agent:

Run huma next and repeat until TODO is 0.

Errors Have Rule IDs

[H-01] Hurl file not found at expected path
  ▶ Check `huma next` output for expected filename

[A-02] Server build command failed
  ▶ go build -cover: exit status 1

Prefix	Domain
`M-`	manifest.yaml validation
`E-`	Endpoint input validation
`H-`	Hurl file validation
`S-`	Session state validation
`A-`	Adapter/server validation

This is the Reins Engineering principle. Feedback must be mechanical, structural, and repeatable.

codistill, huma, yongol Pipeline

Legacy codebase
    │
    ▼
codistill ──► openapi.yaml    (API spec extraction)
    │
    ▼
huma ──► hurl/*.hurl        (test generation)
    │
    ▼
yongol ──► refactored code  (SSOT-based rebuild)

codistill extracts OpenAPI specs from legacy code. huma generates Hurl tests for every endpoint. yongol rebuilds on SSOT. The manifest.yaml shared between huma and yongol means zero transition cost.

huma works independently of yongol too. Any SaaS backend with openapi.yaml can run the ratchet. Go, Python, or Node.js.

10 Minutes Is Enough

go install github.com/park-jun-woo/huma@latest

Or install as a Claude Code skill:

npx skills add park-jun-woo/huma

huma scan
huma next – read prompt, write Hurl
huma next – verify, advance
Repeat until TODO is 0

No config. No learning curve. 42 endpoints means 42 tests.

Don’t give the agent permission to declare “done.” Let the machine decide. That is the ratchet.

Code: github.com/park-jun-woo/huma

Ratchet Pattern – How to Make Agents Finish the Job – huma’s theoretical background. Verifier-based unidirectional execution.
yongol – The Keel of AI Coding SaaS – The upper framework of huma’s test ratchet. 10 SSOT consistency verification.
Hurl Stops Vibe Coding Drift – Strategy of declaring API contracts with Hurl and locking with a ratchet.
Ratchet Code That Exploits IFEval – Feedback loops exploiting sycophancy bias and Reins.
Feedback Topology over Model IQ – Why feedback structure determines outcomes more than model performance.
Reins Engineering – AI with Reins – The Reins tool ecosystem huma belongs to.

References

Jimenez, C. E. et al. (2024). SWE-bench. ICLR 2024. arxiv.org/abs/2310.06770
Xu, F. et al. (2024). TheAgentCompany. arxiv.org/abs/2412.14161
Lu, Q. et al. (2025). Runaway is Ashamed, But Helpful. arxiv.org/abs/2505.17616
Golmohammadi, A. et al. (2024). Testing RESTful APIs: A Survey. TOSEM. dl.acm.org/doi/10.1145/3617175
Martin-Lopez, A. et al. (2019). Test Coverage Criteria for RESTful Web APIs. dl.acm.org/doi/10.1145/3340433.3342822
Atlidakis, V. et al. (2019). RESTler: Stateful REST API Fuzzing. ICSE 2019. dl.acm.org/doi/10.1109/ICSE.2019.00083
Arcuri, A. (2019). RESTful API Automated Test Case Generation with EvoMaster. TOSEM. dl.acm.org/doi/10.1145/3293455

Changelog

2026-05-26: Initial release