After the Parse: Post-Processing as Ontology Maintenance

Why extraction alone does not yield a trustworthy narrative graph — and how four typed, ordered transforms turn candidates into query-grade ontology.

Mar 29, 2026

Most AI story-graph writeups quietly assume extraction is the hard part.

It isn’t.

The problem that determines whether your graph is a research artifact or a query-trustworthy knowledge base is maintaining semantic stability after reruns, enrichment, and cross-phase mutation. A graph that gets less reliable every time you touch it is not a product. It’s a liability.

Fabula’s recent post-processing work is a response to that problem. This post describes the design, the implementation specifics, and — honestly — the failure modes we’re still managing.

Post-processing is not cleanup. It is the typed, ordered mechanism by which extracted narrative candidates become a query-trustworthy ontology.

The practical problem

A successful extraction pass (Fabula’s Phases 0-5, roughly 67 minutes per episode) can still leave you with a graph where:

Institutional relationships are implicit. The script says “Leo, the Chief of Staff.” Phase 4 synthesizes an Agent named Leo McGarry and an Organization named The White House Senior Staff. But no AFFILIATED_WITH edge connects them — because the extraction focused on events, not on institutional semantics.
Spatial structure is flat. Extraction yields Location nodes like “The Oval Office”, “The West Wing”, and “The White House” as siblings. The containment relationships (The Oval Office is PART_OF The West Wing, which is PART_OF The White House) are latent in the script text but not materialized as edges.
Salience is invisible. An Agent who appears in 20 of 22 episodes and one who appears in a single walk-on role have the same structural weight. Nothing in the raw graph distinguishes Josh Lyman from Third Reporter.
Latent integrity defects hide. An entity with status: 'deprecated' that still holds PARTICIPATED_AS edges (because the harmonizer transferred some relationships but not all) looks fine in a single-episode query. It becomes a graph-level inconsistency that compounds across seasons.

None of these are “nice-to-fix-later.” They directly determine whether graph queries return interpretable results or misleading ones.

The design response: typed, ordered transforms

Fabula’s post-process layer is explicit and ordered. The step definitions live in runner.py as an OrderedDict:

ALL_STEPS = OrderedDict([
    ("integrity", {"description": "Graph integrity tests", "read_only": True}),
    ("affiliations", {"description": "Agent-org affiliation", "read_only": False}),
    ("locations", {"description": "Location hierarchy inference", "read_only": False}),
    ("gravity", {"description": "Entity tier calculation", "read_only": False}),
])

This ordering encodes assumptions. Let me walk through each step and explain what those assumptions are.

1) Integrity as precondition gate

What it does: Runs read-only validation tests against the season graph. Entity counts, orphan detection, schema conformance, status-model consistency.

Why it runs first: Because every subsequent step writes to the graph. If you infer affiliations on top of a graph that contains deprecated entities with orphaned edges, you’re encoding defects as structure.

What this protects against:

Orphan propagation — a PARTICIPATED_AS edge pointing to a deprecated Agent that should have been transferred to its merge target. We caught exactly this: 4 deprecated Agents still holding 36 stale relationships in West Wing S1.
Schema drift becoming encoded as new edges — if a property name is wrong (organization_uuid instead of org_uuid), downstream inference that reads that property will silently produce incorrect results.
Lifecycle inconsistencies contaminating inference — a deprecated entity that hasn’t been fully merged shouldn’t participate in affiliation inference. Integrity catches these before they matter.

2) Affiliation inference as institutional semantics

What it does: For each canonical Agent, infers AFFILIATED_WITH edges to Organizations, grounded in shared-event evidence.

How it actually works:

Community scoping via Louvain. A GDS projection links Agents and Organizations through shared Event participation. Louvain community detection clusters them. Each Agent’s candidate list narrows to its own community plus direct co-participants — significantly reducing LLM calls. The reduction is logged per run and depends on graph density, but the scoping turns an O(agents × orgs) problem into something tractable.
Evidence gathering. For each Agent-Organization pair, the system queries shared event count, sample titles, and the Agent’s observed_status across those events.
LLM adjudication via BAML. InferAgentAffiliation() returns affiliations with relationship_type, confidence, and reasoning. Only edges >= 0.7 confidence are materialized.

The resulting edge:

(agent:Agent)-[:AFFILIATED_WITH {
    confidence: 0.85,
    reasoning: "McGarry serves as Chief of Staff...",
    inferred_by: 'llm_affiliation_inference',
    relationship_type: 'leader',
    created_at: datetime()
}]->(org:Organization)

(Note: the pipeline’s Phase 4 extraction also creates AFFILIATED_WITH edges with a slightly different property schema — inference_reason instead of reasoning, and inferred_by: 'affiliation_handler'. Post-processing enriches beyond what extraction provides, adding community-scoped, evidence-grounded affiliations that extraction missed.)

You can now ask “which Organizations influenced this conflict cluster?” and get results that reflect institutional structure, not just name co-occurrence.

3) Location hierarchy as topological correction

What it does: Infers PART_OF containment edges between Location nodes.

Five-step pipeline:

Materialize existing hierarchy from part_of_location_uuid properties → PART_OF edges (confidence: 1.0).
Gather evidence — scene headings, participating Agents, scene sequence numbers.
Cluster by name similarity — UnionFind with SequenceMatcher ratio >= 0.65 and substring containment.
LLM inference per cluster — InferLocationHierarchy() returns containment decisions with confidence and reasoning.
Create PART_OF edges with full provenance.

Consequence: Spatial queries stop being flat:

MATCH (child:Location)-[:PART_OF*1..3]->(parent:Location {canonical_name: 'The White House'})
MATCH (child)-[:IN_EVENT]->(e:Event)
RETURN child.canonical_name, count(e) AS events_here
ORDER BY events_here DESC

4) Gravity as salience materialization

What it does: Calculates episode_count and assigns tier labels: asteroid (transient), planet (recurring), anchor (load-bearing).

Three-phase migration:

Initialize tier='asteroid' and episode_count=0 on all entities.
Count distinct Episodes per entity via participation chains.
Assign tiers based on fixed thresholds: anchor at >= 5 episodes, planet at >= 2, asteroid below that.

Consequence: Exploration can be constrained by narrative significance — in Doctor Who’s Genesis of the Daleks (6 episodes), anchor-tier surfaces The Doctor, Davros, Harry Sullivan, Nyder, Sarah Jane Smith, and Sevrin:

MATCH (a:Agent {tier: 'anchor'})-[:PARTICIPATED_AS]->(e:Event)
RETURN a.canonical_name, a.episode_count, count(e) AS event_count
ORDER BY a.episode_count DESC

Tradeoffs (explicit)

Cost

More pipeline steps and stricter run discipline.
Each step has its own thresholds, modes, and failure conditions.
Checkpoint-resume doesn’t roll back partial writes within a step.

Benefit

Reproducibility — atomic checkpoints, resume on failure.
Safer reruns — additive mode (default) skips existing affiliations; replace mode re-evaluates.
Mutation provenance — every inferred edge carries inferred_by, confidence, and reasoning.
Export confidence — HuggingFace parquet exports carry their own provenance.

This is an engineering tradeoff, not a stylistic preference.

Failure modes we’re still managing

Affiliation confidence calibration. The 0.7 threshold is empirically chosen, not theoretically grounded.
Location clustering sensitivity. The 0.65 threshold works for English-language screenplays; untested on non-English scripts.
Gravity tier boundaries are fixed, not series-adaptive. The current thresholds (anchor >= 5, planet >= 2) are constants. A future improvement would auto-derive thresholds from series metadata.
Checkpoint atomicity. Partial writes within a step are mitigated by Cypher MERGE but not fully guaranteed.

Why this matters beyond Fabula

Narrative AI systems over-optimize extraction demos and under-invest in ontology lifecycle. The result is high local recall with low global reliability.

Fabula’s post-processing direction is a bet that durable narrative intelligence requires both phases:

Recover candidate structure from text.
Maintain semantic structure under iteration.

That second phase is where most systems quietly fail — not because the problem is unsolvable, but because it’s less photogenic than an extraction demo. The graph that matters is the one you can query next month, not the one that looks good in a screenshot today.

Mike's Substack

Discussion about this post

Ready for more?