huma – A Ratchet That Never Skips an Endpoint Image: AI generated

The Agent That Stops at Endpoint 15

You tell an AI agent to write Hurl tests for a SaaS backend. There are 42 endpoints.

The first 10 go smoothly. Login, signup, list – similar patterns, so it is fast. Around 15, the agent says:

“The remaining endpoints follow a similar pattern, so I have completed the task.”

Endpoints with tests: 15. Remaining 27: none.

The agent is not lazy. Cemri et al. analyzed 7 multi-agent frameworks and found that 41% to 86.7% of tasks failed, with “premature termination” identified as one of 14 failure modes (Cemri et al., NeurIPS 2025). LLMs are optimized for generation, not progress tracking. They cannot remember which endpoints have tests and which don’t. “Similar pattern” becomes justification for skipping the remaining 27.

Does a bigger model fix it? Maybe it gets to 20. It still won’t reach 42. The problem is not model size but the structure where the agent itself judges completion.


Why a Ratchet Is Needed

The core of the Ratchet Pattern is simple. Take completion judgment away from the agent.

A mechanical verifier determines “this is not done yet” and “this is done.” The agent works until the verifier passes it, then moves to the next. No going backward. What passed once stays passed forever.

huma applies this pattern to Hurl API testing. Given an endpoint list, it creates a ratchet session and forces the agent to write and pass tests one by one. If there are 42, all 42. Kim et al. applied 10 SOTA tools to 20 real services and found all achieved low coverage, concluding “resolving inter-endpoint dependencies is key to effective API testing” (Kim et al., ISSTA 2022). huma’s sequential ratchet structurally solves this.

Hurl Master – the name says it all.


Commands

huma has six commands.

CommandRole
huma scanAuto-detects openapi.yaml and scans endpoints
huma scan --from <file>Scans from a specified file (OpenAPI, JSON array, YAML)
huma nextShows the next incomplete endpoint, or verifies and advances
huma verifyRuns the current endpoint’s Hurl test; advances on pass
huma statusShows progress (TODO/PASS/IMPROVE/DONE counts)
huma promptOutputs an agent prompt for the current TODO (no side effects)

No config file. No DSL. “First test in 10 minutes” is the design goal.


The Ratchet Loop

The full workflow has three stages.

openapi.yaml ──► huma scan ──► session
                                 │
                           huma next ◄──┐
                             │          │
                     ┌───────┴───────┐  │
                     │  TODO         │  │
                     │  (no .hurl)   │  │
                     └───────┬───────┘  │
                       agent writes    │
                       .hurl           │
                             │          │
                     ┌───────┴───────┐  │
                     │  PASS/IMPROVE │──┘
                     └───────────────┘

1. Endpoint Scan

huma scan
# Auto-detected openapi.yaml
# Scanned 42 endpoints

2. Check Next TODO

huma next
# TODO  GET /api/v1/users

3. Write Test

The agent writes a Hurl file. Golden path + at least one error case (400/401/404).

4. Verify and Advance

huma next
# PASS  GET /api/v1/users → hurl/get_api_v1_users.hurl
# TODO  POST /api/v1/auth/login

Pass locks, next TODO appears. Fail stays on the same endpoint. The agent cannot declare “done.” As long as huma next finds TODOs, the loop continues.

huma status
# 15/42 PASS  |  0 IMPROVE  |  27 TODO

Ratchet States

TODO → PASS → (complete)
TODO → IMPROVE → PASS (or DONE)
StateMeaning
TODONo .hurl file. Agent gets handler source + expected responses + Hurl example
IMPROVEHurl exists but response status codes are missing. Shows which codes are missing
PASSAll expected response status codes are covered
DONECoverage stalled after retries. Accepted at current level

Direction is one-way only. TODO to PASS or DONE. No backward transitions. This is the ratchet.


Two Modes

Static Mode (no server)

No testing.server block in manifest.yaml. Checks Hurl files against expected status codes from OpenAPI or source analysis without running a server.

Live Mode (server running)

Add testing.server block. huma builds the binary, starts the server, waits for health check, runs Hurl. Full runtime verification ratchet.


Language Agnostic

LanguageAdapterAnalyzerbackend.lang
GoGoAdaptergo/astgo (default)
PythonPythonAdapterregexpython
Node.jsNodeAdapterregexnode

Hurl tests are HTTP-level, so language-independent. Same ratchet loop works on Django, Express, or Gin.


Agent Native

huma next outputs everything an agent needs: handler source, expected file path, instructions. Ryan et al. demonstrated that providing execution paths as prompts to LLMs doubles test coverage (Ryan et al., FSE 2024). huma next does exactly this.

huma prompt is even more explicit – outputs an agent prompt for the current TODO with no side effects.

One instruction for the agent:

Run huma next and repeat until TODO is 0.


Errors Have Rule IDs

[H-01] Hurl file not found at expected path
  ▶ Check `huma next` output for expected filename

[A-02] Server build command failed
  ▶ go build -cover: exit status 1
PrefixDomain
M-manifest.yaml validation
E-Endpoint input validation
H-Hurl file validation
S-Session state validation
A-Adapter/server validation

This is the Reins Engineering principle. Feedback must be mechanical, structural, and repeatable.


juicer, huma, yongol Pipeline

Legacy codebase
    │
    ▼
juicer ──► openapi.yaml    (API spec extraction)
    │
    ▼
huma ──► hurl/*.hurl        (test generation)
    │
    ▼
yongol ──► refactored code  (SSOT-based rebuild)

juicer extracts OpenAPI specs from legacy code. huma generates Hurl tests for every endpoint. yongol rebuilds on SSOT. The manifest.yaml shared between huma and yongol means zero transition cost.

huma works independently of yongol too. Any SaaS backend with openapi.yaml can run the ratchet. Go, Python, or Node.js.


10 Minutes Is Enough

go install github.com/park-jun-woo/huma@latest

Or install as a Claude Code skill:

npx skills add park-jun-woo/huma
  1. huma scan
  2. huma next – read prompt, write Hurl
  3. huma next – verify, advance
  4. Repeat until TODO is 0

No config. No learning curve. 42 endpoints means 42 tests.

Don’t give the agent permission to declare “done.” Let the machine decide. That is the ratchet.

Code: github.com/park-jun-woo/huma


References