Triples Are Claims, Not Facts

Triples Are Claims, Not Facts Image: AI generated

Wikidata’s Silence

Wikidata has a triple like this:

(tomato, instance_of, vegetable) — rank: preferred
(tomato, instance_of, fruit)     — rank: normal

Who decided preferred? Why preferred? In what context preferred?

Wikidata is silent on these questions. An editor decides, and the system stores the decision. Nothing more.

But whether a tomato is a vegetable or a fruit is not a physical constant. Ask a chef — it’s a vegetable. Ask a botanist — it’s a fruit. Ask the U.S. Supreme Court — it’s a vegetable (1893, Nix v. Hedden). Three answers to the same question, and none of them wrong.

Triples in a knowledge graph are not facts. They are claims.

Claims Need Argumentation

To store claims, you need structure. Toulmin’s argumentation model provides that structure.

Element	Role	Tomato Example
Claim	Assertion	“A tomato is a vegetable”
Ground	Direct evidence	“Classified as vegetable in culinary taxonomy”
Backing	Source/authority	“Le Guide Culinaire (1903)”
Qualifier	Scope	“In a culinary context” (confidence 0.8)
Rebuttal	Defeat condition	“In a botanical context, it’s a fruit — ovary structure”
Warrant	Connecting logic	“Traditional ingredient classification is based on culinary use”

Instead of forcing a single truth value per triple, elevate the triple to a subject of argumentation. There’s a claim, there’s evidence, there’s a defeat condition, there’s a source. And judgment happens — not at storage time, but at query time.

This idea itself is not new. In academia, Dung’s abstract argumentation frameworks (1995), ASPIC+ (2010), and nanopublications have addressed argumentation over knowledge graphs. The difference is one thing — we deliver it as executable code, not papers. Install with go install, write rules as Go functions, run it right now.

Context Determines Truth

Storage is argumentation structure. Judgment is runtime.

ctx.Set("domain", "cooking")
results, _ := g.Evaluate(ctx)
// verdict: +0.8 → vegetable

ctx.Set("domain", "botany")
results, _ = g.Evaluate(ctx)
// verdict: -0.9 → not a vegetable (fruit)

Same graph, same argumentation structure, same code. Only the context changed. Query in a culinary context: +0.8 (vegetable). Query in a botanical context: -0.9 (fruit). The verdict follows context.

This is the decisive difference from Wikidata’s static rank. It’s not an editor deciding preferred — it’s the querier’s context that produces the judgment.

Is Pluto a Planet?

Triple: (Pluto, instance_of, planet)

Claim:
  Ground: "Solar orbit, spherical, sufficient mass"
  Backing: IAU 1930 classification resolution

Rebuttal:
  Ground: "Orbit-clearing criterion not met"
  Backing: IAU 2006 Resolution 5A

Exception to rebuttal:
  Ground: "Still taught as a planet in education curricula"
  Backing: NASA Education Resources 2024

ctx.Set("domain", "astronomy_formal")
// verdict: -0.8 (not a planet)

ctx.Set("domain", "education")
// verdict: +0.3 (can be treated as planet in context)

For someone who attended elementary school before 2006, Pluto is a planet. For the IAU, Pluto is a dwarf planet. Both have evidence, both have sources. The system’s job is not to choose one — it’s to store both and judge according to context.

When Sources Are Attacked

In academic debate, sources themselves are frequently attacked.

// Claim: Drug X is effective
claim := g.Rule(drugXEffective).
    With(&TripleSpec{
        Ground:  "Significant effect in Phase 3 clinical trial",
        Backing: "Smith et al. (2020), NEJM",
    })

// Rebuttal: Independent replication failed
counter := g.Counter(replicationFailed).
    With(&TripleSpec{
        Ground:  "Effect not confirmed in independent replication",
        Backing: "Jones et al. (2022), Lancet",
    })
counter.Attacks(claim)

// Undercutter: conflict of interest
undercutter := g.Counter(conflictOfInterest).
    With(&TripleSpec{
        Ground:  "Smith's funding came from the pharmaceutical company",
        Backing: "Retraction Watch (2023)",
    })
undercutter.Attacks(claim)

Smith’s paper was published in NEJM. An authoritative source. But when the funding source is revealed, the entire claim based on that paper weakens. The counter directly rebuts the claim; the undercutter weakens the claim’s evidential foundation. Both attack the claim, but in different ways. The h-Categoriser synthesizes the strength of these attacks to compute the final verdict.

Truth vanishes at the speed of light; only claims remain. The system manages claims, not declares truth.

Does Every Triple Need Argumentation?

No.

(water, chemical_formula, H2O)   — Settled. No argumentation needed.
(tomato, instance_of, ?)         — Contested. Attach argumentation.
(Pluto, instance_of, ?)          — Contested. Attach argumentation.
(Jerusalem, capital_of, ?)       — Contested. Attach argumentation.

The criterion is simple: if multiple objects exist for the same subject + predicate, or ranks diverge, or references conflict — it’s a contested triple. The rest stays as plain triples.

Attaching argumentation to water’s chemical formula is waste. Not attaching argumentation to Jerusalem’s capital status is a lie.

The Judgment Engine: h-Categoriser

The argumentation graph is judged by Amgoud’s h-Categoriser. It computes an acceptability score on a [-1, +1] scale for each node — the higher an attacker’s acceptability, the more the attacked node’s score drops. Iterates recursively until convergence.

Performance: even with 100,000 contested triples each carrying their own argumentation graph, a query evaluates only that triple’s graph. Independent of the total knowledge graph size.

Triple lookup:        milliseconds (existing index)
Argumentation eval:   milliseconds (h-Categoriser)
Verdict trace:        milliseconds (graph traversal)

Don’t scale the model. Scale the argumentation.

Mapping to Wikidata Ranks

Wikidata	toulmin extension
preferred rank	verdict > +0.5 (in current context)
normal rank	0 < verdict ≤ +0.5
deprecated rank	verdict ≤ 0
reference	Backing
qualifier	Qualifier + context function conditions

The difference: Wikidata’s rank is static — editors decide. Toulmin’s verdict is dynamic — context and argumentation structure decide.

The Bigger Picture

This system is not domain-specific.

AI Prefrontal Cortex (feature selection):
  triple = (feature, should_include_in, app_type)
  context = current project's app tags
  verdict = should this feature be included?

Knowledge graph general:
  triple = (subject, predicate, object)
  context = querier's domain/perspective
  verdict = is this triple true in current context?

Same engine. Same structure. Different domain. Rules are Go functions, exceptions are defeats graphs, judgment is h-Categoriser. No DSL.

Why This Matters

LLMs dissolve knowledge into weights. Ask a question, get an answer. But you cannot structurally trace whether that answer is true in what context, based on what source, and whether rebuttals exist. Hallucination comes from this structural absence.

This system cannot prevent all hallucination. LLMs produce open-ended output, and you cannot pre-register every possible claim. But for claims already registered in the argumentation graph, you can compare an LLM’s generated answer against the graph and assess credibility. “What is this claim’s Backing? Is there a Counter attacking that Backing? Is the verdict positive in the current context?”

Not a universal truth oracle. A credibility assessment system that operates on accumulated argumentation.

Not a system that stores facts, but one that manages claims. Not a system that declares truth, but one that traces judgment. This is the next step for knowledge graphs.

toulmin — Go Rule Engine — Toulmin argumentation model-based rule engine. The judgment engine behind this article.
Ratchet Pattern — How to Make Agents Go All the Way — Deterministic verification and ratchet locking.

Code: github.com/park-jun-woo/toulmin

go install github.com/park-jun-woo/toulmin@latest

The code examples in this article represent a design vision based on the toulmin library’s current API. The knowledge graph extension (TripleSpec, context-based evaluation) is under active development. The core judgment engine (h-Categoriser, defeats graph, Rule/Counter) works today.

References

Toulmin, S. (1958). The Uses of Argument. Cambridge University Press.
Dung, P.M. (1995). “On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic Reasoning, Logic Programming and n-Person Games.” Artificial Intelligence, 77(2), 321–357.
Amgoud, L. & Ben-Naim, J. (2013). “Ranking-based semantics for argumentation frameworks.” SUM 2013, LNCS 8078, 134–147.
Modgil, S. & Prakken, H. (2014). “The ASPIC+ framework for structured argumentation: a tutorial.” Argument & Computation, 5(1), 31–62.
Groth, P. et al. (2010). “Anatomy of a Nanopublication.” Information Services & Use, 30(1-2), 51–56.
Nix v. Hedden, 149 U.S. 304 (1893). United States Supreme Court.