toulmin — A Rule Engine That Computes Contracts

Rule engines have stood on the same premise for 60 years: the validation target is a “fact.”

Drools puts Java objects as “facts” into working memory. Rego treats input as already-true data. JSON Schema assumes the document structure is given. It’s all the same assumption — incoming data is fact.

But what is a rule engine for? Validating whether data satisfies rules. Calling something that needs validation “already true” is a contradiction.

Not Facts, but Claims

Validation targets are not facts — they are claims. Assertions that may be true or false. Their validity must be judged by rules.

JWT already follows this principle. It calls sub, exp, iss not “facts” but “claims.” They are the token issuer’s assertions. Only after verifying the signature, checking expiration, and matching the issuer can they be trusted.

This structure was already established in 1958.

Toulmin’s Argumentation Model

Stephen Toulmin analyzed the structure of argumentation into six elements in 1958:

Claim: The target of judgment. What must be verified as true or false.
Ground: The evidence data used for judgment.
Warrant: The rule that determines whether the ground supports the claim.
Backing: The justification for why the rule is valid.
Qualifier: The degree of confidence in the judgment.
Rebuttal: The exception conditions under which the claim does not hold.

Formal logic says “if the premises are true, the conclusion is true.” Toulmin was different. “A claim is supported by grounds and warrants, but overturned if exception conditions exist.” Every argument is defeasible.

Rule engines have stood on the formal logic side for 60 years. Input is fact, output is allow/deny, exceptions are a separate mechanism. Toulmin stood on the opposite side. Input is claim, output is degree, exceptions are built-in.

The problem was — Toulmin’s book sat on the philosophy shelf. It was invisible from the rule engine shelf. A 60-year missing link.

So I Built a Rule Engine

toulmin implements Toulmin’s argumentation model as a Go rule engine.

Requirements Evolve

Let’s see how if-else and toulmin respond to the same evolution of requirements.

// Monday: "Only authenticated users, IP blocking applied, internal network exempt from blocking"
g := toulmin.NewGraph("api:access")
auth    := g.Warrant(isAuthenticated, nil, 1.0)
blocked := g.Rebuttal(isIPBlocked, nil, 1.0)
exempt  := g.Defeater(isInternalIP, nil, 1.0)
g.Defeat(blocked, auth)
g.Defeat(exempt, blocked)

// Tuesday: "Add rate limiting"
limited := g.Rebuttal(isRateLimited, nil, 1.0)
g.Defeat(limited, auth)

// Wednesday: "Premium users are exempt from rate limits"
premium := g.Defeater(isPremiumUser, nil, 1.0)
g.Defeat(premium, limited)

// Thursday: "During incident response, even premium users are limited"
incident := g.Rebuttal(isIncidentMode, nil, 1.0)
g.Defeat(incident, premium)

Two lines added each day, no changes to existing code. The same evolution with if-else:

// Monday
if user != nil {
    if blockedIPs[ip] {
        if strings.HasPrefix(ip, "10.") {
            allow = true
        }
    } else {
        allow = true
    }
}

// Thursday — 4 levels of nesting, structure unreadable
if user != nil {
    if blockedIPs[ip] {
        if strings.HasPrefix(ip, "10.") {
            allow = true
        }
    } else if isRateLimited(ip) {
        if isPremium(user) {
            if !incidentMode {
                allow = true
            }
        }
    } else {
        allow = true
    }
}

toulmin: 2 lines per requirement, structure unchanged. if-else: Rewrite the entire structure every time.

Rules Are Go Functions

func(claim any, ground any, backing any) (bool, any)

ground = judgment material that varies per request (user, IP, context)
backing = judgment criteria fixed at graph declaration time (thresholds, role names, config)
Return = (judgment result, evidence). Evidence is a domain-specific free type.

func CheckOneFileOneFunc(claim, ground, backing any) (bool, any) {
    g := ground.(*FileGround)
    if len(g.Funcs) > 1 {
        return true, &Evidence{Got: len(g.Funcs), Expected: 1}
    }
    return false, nil
}

No need to learn a new language like Rego. Just write Go functions.

backing — Same Function, Different Judgment Criteria

backing passes judgment criteria to rules as runtime values. Registering the same function with different backings creates separate rules:

g := toulmin.NewGraph("access")
admin  := g.Warrant(isInRole, "admin", 1.0)
editor := g.Warrant(isInRole, "editor", 0.8)

g := toulmin.NewGraph("line-limit")
strict  := g.Warrant(CheckLineCount, &LineLimit{Max: 100}, 0.7)
relaxed := g.Warrant(CheckLineCount, &LineLimit{Max: 200}, 0.5)
g.Defeat(relaxed, strict)

When backing is nil, it means the rule needs no judgment criteria.

Exceptions Are Declared as a Graph

Declare relationships between rules with the Graph Builder API and the engine handles the rest. Functions are identifiers. No string names needed.

g := toulmin.NewGraph("filefunc")
w := g.Warrant(CheckOneFileOneFunc, nil, 1.0)
d := g.Defeater(TestFileException, nil, 1.0)
g.Defeat(d, w)

results, _ := g.Evaluate(claim, ground)

The same function can be reused in different graphs with different defeat relationships:

strictGraph := toulmin.NewGraph("strict")
strictGraph.Warrant(CheckOneFileOneFunc, nil, 1.0)
// No exceptions — test files not allowed either

lenientGraph := toulmin.NewGraph("lenient")
w := lenientGraph.Warrant(CheckOneFileOneFunc, nil, 1.0)
r1 := lenientGraph.Rebuttal(TestFileException, nil, 1.0)
r2 := lenientGraph.Rebuttal(GeneratedFileException, nil, 0.8)
lenientGraph.Defeat(r1, w)
lenientGraph.Defeat(r2, w)
// Both test + generated files are exceptions

Judgment Rationale Is Traced

EvaluateTrace tracks not just the verdict but which rules activated and which rules defeated which:

traced := g.EvaluateTrace(claim, ground)
// traced[0].Verdict: +0.6
// traced[0].Trace: [
//   {Name: "CheckOneFileOneFunc", Role: "warrant",  Activated: true,  Qualifier: 1.0},
//   {Name: "TestFileException",   Role: "rebuttal", Activated: true,  Qualifier: 1.0},
// ]

When there are dozens of rules, “why did this verdict come out” is human-readable.

The Verdict Is Computed by a Single Formula

Amgoud’s h-Categoriser (2013) is applied:

raw = w / (1 + Σ raw(attackers))
verdict = 2 × raw - 1

+1.0 — violation confirmed
0.0 — undecidable
-1.0 — rebuttal confirmed

When a rule fires, it becomes a warrant. When an exception fires, it becomes an attacker. The formula computes the balance of power between them to produce a verdict. What about exceptions to exceptions? They become attackers of attackers, restoring the original rule. Compensation principle — a property that only h-Categoriser satisfies.

Rules Have Three Strengths

Nute’s (1994) classification is applied:

Strength	Meaning	Example
Strict	Can never be defeated	“No admin API access without authentication”
Defeasible	Can be defeated by exceptions	“One function per file”
Defeater	Only blocks other rules, makes no claim of its own	“Test files are exceptions”

Strict rules reject attack edges. Defeaters only attack and have no judgment of their own. This structurally expresses the enforcement level of rules.

How Is It Different from Rego?

	Rego	toulmin
Rule authoring	Must learn Rego DSL	Go functions
Exception handling	Manual `default`/`else` patterns	Declarative defeats graph
Judgment	Binary allow/deny	Continuous [-1, +1]
Rule justification	`# METADATA` (ignored by engine)	backing (part of the structure)
Rule strength	None	strict/defeasible/defeater
Engine size	Tens of thousands of lines	Hundreds of lines
Speed	Interpreter (parse -> AST -> evaluate)	Direct Go function calls

Rego is broad — it has a Kubernetes, Terraform, and Envoy integration ecosystem. toulmin is deep — it has what Rego lacks (defeasibility, qualifier, backing).

Repositioning the Qualifier

In Toulmin’s original model, the Qualifier is attached to the Claim. “This patient probably should be given penicillin” — a modal qualifier expressing the confidence of the claim.

The toulmin engine repositions the Qualifier from the Claim to each Rule. In a rule engine, a claim is merely the validation target. “This file has 3 functions” — it’s a factual check, not something that needs a confidence level. What determines the quality of judgment is the rule’s confidence:

“One function per file” — qualifier 1.0 (certain rule)
“Recommended under 100 lines” — qualifier 0.7 (flexible rule)

Each Rule’s qualifier becomes the initial weight w(a) in h-Categoriser, and the final verdict takes over the role that the Qualifier played in Toulmin’s original model — the confidence of the judgment.

Empirical Validation: Converting filefunc’s 22 Rules to Toulmin

filefunc is a code structure convention tool for LLM-native Go development. All 22 rules were converted to Toulmin warrants.

Strength Classification

Strength	Count	Ratio	Examples
Strict	15	68%	F1, F2, F3, F4, A1-A3, A6-A16
Defeasible	4	18%	Q1, Q2, Q3, C4
Defeater	3	14%	F5, F6, test file exception

Most are strict — code structure conventions inherently minimize exceptions.

Quantitative Results

Project	Files (before -> after)	Avg LOC/file (before -> after)	SRP violations resolved	Depth violations resolved
filefunc	— (compliant from start)	25.1	0	0
fullend	87 -> 1,260	244 -> 25.4	66 -> 0	148 -> 0
whyso	12 -> 99	147.8 -> 24.4	12 -> 0	23 -> 0

fullend went from 87 files to 1,260. The number of files exploded, but average LOC dropped from 244 to 25.4. All 66 SRP violations and 148 depth violations went to 0.

Theoretical Foundation

There is no original theory. It’s all existing research:

Element	Original Work
6-element structure	Toulmin (1958)
strict/defeasible/defeater	Nute (1994)
h-Categoriser	Amgoud & Ben-Naim (2013)

The originality lies in the discovery that these connect. Things that existed separately in philosophy (Toulmin), logic (Nute), and argumentation theory (Amgoud) for 60 years meet at a single point: the software rule engine.

Computing Contracts

The rule of law works not because judges are smart, but because the structure forces judgment. Rules exist, exceptions are declared, and verdicts are computed based on evidence.

toulmin moved this structure into code.

Warrant = statute
Backing = legislative intent
Strength = mandatory vs. discretionary provision
Rebuttal = exception clause
Claim = case
Ground = evidence
h-Categoriser = verdict

Declare contracts (warrants), declare exceptions (rebuttals), supply evidence (grounds), and the verdict is computed.

Not by human judgment. By formula.

Acc(a) = w(a) / (1 + Σ Acc(attackers))

Graphs Can Be Defined in YAML

Declare graph structure in YAML without Go code and generate the code:

graph: filefunc
rules:
  - name: CheckOneFileOneFunc
    role: warrant
    qualifier: 1.0
  - name: TestFileException
    role: rebuttal
    qualifier: 1.0
defeats:
  - from: TestFileException
    to: CheckOneFileOneFunc

toulmin graph filefunc.yaml    # generates graph_gen.go

Just write the rule functions in Go. The graph structure is declared in YAML.

MIT License. github.com/park-jun-woo/toulmin