
How Do You Refactor Code with No Tests?
You inherited a 100,000-line legacy codebase. There are no tests. You want to refactor, but touching anything might break something. Writing tests requires understanding the code, and understanding the code requires documentation – which doesn’t exist either.
Nobody touches it. It rots further.
Every legacy codebase in the world is stuck in this deadlock. 60-80% of Fortune 500 IT budgets go to legacy maintenance. 42% of developer time is spent dealing with technical debt.
What if an LLM could write the tests for you?
The Problems When You Hand Tests to an LLM
Ask an LLM to “write tests for this function” and it produces something. The problems are threefold.
First, it doesn’t know where to start. When there are 527 functions, do you go in order from #1? Start with the most critical? There’s no criterion.
Second, you can’t verify test quality. The LLM’s tests pass. But are they actually verifying the function’s behavior, or are they empty shells that just call the function with no assert? You’d have to read each one manually to know.
Third, without feedback, LLM tests plateau at 60-70%. Just saying “test this function” won’t reach 100% branch coverage. You need to tell it which branches are missing so it can fill the gaps.
It’s not that LLMs can’t write tests. The problem is the absence of a structure that tells the LLM what to write and how well it wrote it.
tsma: A Test Rail Driven by One Command
tsma is a CLI tool that indexes every function in a project, detects test presence, measures coverage, and gives precise feedback to LLM agents.
The agent needs to know exactly one command:
$ tsma next
This single command drives the entire loop:
$ tsma next # Shows the next untested function
→ Write a test
$ tsma next # Detects the new test, runs it, measures coverage
→ 100%? PASS, move to the next function
→ <100%? Reports uncovered branches with line numbers
$ tsma next # Re-measures the revised test
→ Whether improved or not, marks DONE and moves on
Repeat until “All functions complete!” appears.
Validated on 527 Functions
tsma was applied to a real Go project with 527 functions.
| Result | Count | Ratio |
|---|---|---|
| PASS (100% branch coverage) | 246 | 46.7% |
| DONE (best-effort) | 281 | 53.3% |
| TODO (unprocessed) | 0 | 0% |
246 functions reached 100% branch coverage. The remaining 281 did not reach 100%, but tests were written to the extent possible.
Why can’t some functions reach 100%?
Functions That Reach 100% and Those That Don’t
Whether a function can reach 100% branch coverage depends on how it receives its dependencies.
Interface (mockable) – 100% achievable:
type Handler struct {
svc AuthSvc // interface -- replaceable with a mock
}
Inject a mock in tests and you can control every path:
svc := mocks.NewMockAuthSvc(ctrl)
svc.EXPECT().Login(...).Return(result, nil) // success path
svc.EXPECT().Login(...).Return(nil, err) // failure path
Concrete type (not mockable) – 100% impossible:
type Handler struct {
svc *service.SMSImportService // struct pointer -- not replaceable
}
The real implementation runs with internal dependencies on databases, external APIs, etc. You can’t force specific errors or specific return values. Branches that depend on those results are unreachable by unit tests.
tsma’s response: After uncovered-branch feedback, it tries once more. If the branches are still unreachable, it accepts DONE. This isn’t a tool limitation – it reflects the code’s testability. Introducing interfaces (DI) would make 100% possible, but that means modifying the original code.
Feedback Dramatically Transforms LLM Tests
tsma’s core value isn’t indexing or coverage measurement. It’s telling the LLM exactly which branches are uncovered, by line number.
Without feedback:
"Write tests for the ListContracts function"
→ LLM tests only the happy path
→ Coverage 60-70%
With feedback:
"Write tests for the ListContracts function"
→ Coverage 65% (11/17)
→ UNCOVERED:
line 41 -- if params.Status != nil
line 44 -- if params.BuildingId != nil
line 70 -- if err != nil (CountSummary)
→ LLM adds tests covering exactly those branches
→ Coverage 100%
Same LLM. The only difference is the presence of feedback. Three lines of line numbers separate 60% from 100%.
Progress Survives Even When the Agent Dies
LLM agents crash. Token limits, network errors, session drops. You can’t process 527 functions in a single session.
tsma persists progress to .tsma/session.json.
$ tsma status
527 functions
PASS: 246 (46.7%)
DONE: 281 (53.3%)
TODO: 0 (0.0%)
If the agent dies at function #200? A new agent runs tsma next and picks up from #201. session.json is the checkpoint.
Multiple agents can take turns without conflicts. Each function is atomic.
The Session Is a Cache; Source Files Are the Truth
One of tsma’s design principles: the session is a cache, and source files are the source of truth.
If you delete a test file, even if session.json records it as PASS, that function reverts to TODO. The session never drifts from reality.
Principle:
Even if session.json says "PASS"
If the test file is missing → TODO
If the source file changed → re-measurement target
Instructions for the LLM Agent
The agent needs exactly 6 lines of instructions:
1. Run tsma next
2. If TODO -- read the function and write a test
3. If the test fails -- read the error and fix the test
4. If uncovered branches are shown -- add tests covering those branches
5. If PASS/DONE -- the next function is shown automatically
6. Repeat until "All functions complete!" appears
The only command the agent needs to know is tsma next. The CLI constrains the rest.
Trains and Tracks
Vibe coding is a train. It’s fast. But without tracks, it derails.
Every AI coding tool is focused on making the train faster. Bigger models, smarter agents, better prompts. But the faster the train goes, the worse the derailment.
tsma is the track. The LLM generates tests (Neural), and the CLI defines “this far and no further” (Symbolic Constraint). The LLM’s creativity stays intact, but the quality of results is enforced by the machine.
| Before | tsma | |
|---|---|---|
| Test writing | Human (slow) or LLM (chaotic) | LLM writes, CLI verifies |
| Where to start? | Human decides | CLI determines order |
| Quality check | Human reviews | CLI measures coverage |
| Feedback | None | Uncovered branch line numbers |
| Progress tracking | None | session.json automatic |
The LLM generates freely. But it runs only on the track called tsma next.
Language Support
| Language | Indexer | Test Runner | Coverage |
|---|---|---|---|
| Go | go/ast | go test | go test -coverprofile |
| TypeScript | regex | npx vitest / npx jest | c8 / istanbul |
| Python | regex | pytest | coverage.py |
Go uses an AST parser for precise function extraction. TypeScript and Python use regex-based extraction.
Generated files (*_gen.go, *.pb.go), test files, and vendor/node_modules are automatically excluded from indexing.
Installation and Usage
make install
cd your-legacy-project
tsma next
That’s all.
MIT License. github.com/park-jun-woo/tsma