Teaching a Knowledge Graph to Learn from “No”

Oct 23, 2025

In Star Trek: The Next Generation, the crew is constantly tapping away at PADDs—Personal Access Display Devices. These handheld computers first appeared in TNG, designed by Rick Sternbach in 1987. In one scene, Captain Picard reviews a security report on his PADD in the Ready Room. Moments later, Commander Riker checks the ship’s status on what appears to be an identical device on the bridge. For a human viewer, this is unremarkable. But for a computational system trying to build a coherent model of the show’s universe, it raises a critical question.

How does an AI know these aren’t the same object? And more importantly: how do we stop it from asking the same question every single time a PADD appears on screen?

This is the core challenge of modeling a story-world—understanding that generic objects can have specific, distinct identities, and teaching our systems to remember those distinctions. It’s not just about parsing what’s on screen; it’s about building a memory that actually learns.

The Narrative Challenge: A Groundhog Day of Dumb Questions

A naive system—whether using simple text matching or even basic semantic similarity—would see “PADD” and “PADD” and repeatedly flag them as potential duplicates. Every time a new episode is processed, the system would dutifully compare Picard’s PADD from episode one with Data’s PADD from episode five, forcing an expensive decision-making process over and over again.

This cycle of redundant analysis manifests as four kinds of waste:

Computational waste. An LLM performs a detailed, resource-intensive comparison to determine the two PADDs are different—then we simply throw that analysis away.

Financial waste. Each redundant comparison is another API call, costs accumulating across a series like compound interest on stupidity.

Time waste. Human and system time is burned re-evaluating the same obvious distinctions, making each analysis pass slower than the last.

Knowledge waste. The most valuable insight—why the PADDs are different (different owners, different locations, different narrative functions)—evaporates the moment the decision is made.

This isn’t just a Star Trek problem. A system analyzing a police procedural would face the same issue distinguishing the “Security Guard” at the main gate from a “Security Guard” escorting a prisoner. They’re different people in different locations performing different functions, but a simple system sees only identical strings.

The fundamental problem is that traditional entity resolution systems are designed to learn from “yes.” When they merge two entities, they create new knowledge—a synthesis. But they are completely deaf to “no.” A negative decision (to keep entities separate) becomes a dead end, leading nowhere. The system never gets any smarter about the crucial distinctions that define its world.

The Technical Solution: Learning from “No” with Contrastive Sharpening

The solution is to treat the LLM’s “no” not as an endpoint, but as actionable intelligence.

When the model concludes two PADDs are “not the same,” it has to justify that decision. Our approach—Contrastive Entity Sharpening—captures that justification and uses it to permanently refine the knowledge graph. We shift from a simple boolean decision to a learning opportunity: “not the same because...”

This creates a self-improving knowledge graph that becomes more accurate and efficient with each analysis pass. An entity that was once generic, like “PADD,” becomes specific, like “Picard’s PADD,” and the system never has to ask if it’s the same as “Data’s PADD” again.

The distinction is sharpened. The knowledge is retained.

Capturing the “Why”: The Schema

To make the LLM’s reasoning machine-readable, we need to enforce a strict output structure. This is where BAML (Boundary AI Markup Language) comes in—a domain-specific language that allows us to define the exact data structure we expect from the LLM, ensuring its analytical output is immediately usable by our system.

# file: /fabula/baml_src/graph_refinement.baml

class EntityDistinctionClarification {
  entity_uuid string
  suggested_name_refinement string?
  suggested_description_enhancement string?
  key_distinguishing_features string[]
}

From a storyteller’s perspective, each field serves a clear narrative purpose:

entity_uuid: “This is the unique ID of the object we’re talking about.” It points to the specific entity in our graph that needs to be updated.
suggested_name_refinement: “This is how we give the object a proper name.” It’s the mechanism for turning a generic “PADD” into a specific “Picard’s PADD.”
suggested_description_enhancement: “This is where we add crucial context.” This field allows the system to note details like, “Used by Captain Picard to review security protocols in his Ready Room.”
key_distinguishing_features: “This is the explicit reason—the smoking gun—for why it’s different.” This captures the core logic, such as “Different owner (Picard vs. Data).”

A Walkthrough: Sharpening Picard’s PADD

Let’s walk through how this works in practice.

Before. Our knowledge graph has two generic entities created from a script: object_padd_001 with the name “PADD” and object_padd_002 also named “PADD”. During the harmonization process, the system flags these as potential duplicates because their names and initial descriptions are nearly identical.

The Decision. The LLM is tasked with comparing them. It analyzes their context—who uses them, where they appear—and correctly determines they are distinct objects. It returns a decision to KEEP_SEPARATE.

The Output. Crucially, along with the decision, the LLM generates an EntityDistinctionClarification object:

{
  “entity_uuid”: “object_padd_001”,
  “suggested_name_refinement”: “Picard’s PADD”,
  “suggested_description_enhancement”: “A PADD used by Captain Picard in his Ready Room.”,
  “key_distinguishing_features”: [”Owned by Jean-Luc Picard”]
}

After. The system’s _perform_distinction_sharpening function receives this object. It updates the entity in the Neo4j graph database, changing its name to the more specific “Picard’s PADD.” Crucially, it uses an additive approach, complementing the existing description rather than replacing it. This ensures that knowledge accumulated over multiple episodes is preserved and enriched, not overwritten.

Finally, it updates the entity’s vector embedding in ChromaDB. This ensures that a semantic search for “Picard’s personal devices” will now correctly find this specific PADD—a query that would have failed before.

The Safety Net: When Not to Sharpen

This automated process has two key safety features to prevent it from making mistakes.

Confidence gating. Sharpening only occurs when the LLM is highly confident in its decision to keep entities separate (e.g., above a 0.7 confidence threshold). This prevents the system from acting on uncertain guesses and modifying entities based on flimsy evidence.

Skipping the obvious. The system is smart enough to know when not to sharpen. If it compares “Jean-Luc Picard” to “William Riker,” it knows these are already distinct, well-defined entities. In this case, the LLM will provide the distinguishing features for analytical purposes but will suggest no name or description changes, avoiding useless refinements like changing “Jean-Luc Picard” to “Jean-Luc Picard (Not Riker).”

The Implication: An AI That Finally Remembers the Plot

By teaching the AI to learn from “no,” we transform it from a forgetful data processor into an intelligent archivist that builds an increasingly sophisticated model of the narrative universe.

Show the Receipts

A case study on Season 1 of Star Trek: The Next Generation demonstrates the dramatic efficiency and quality gains:

Metric Result After One Season Duplicate Flags 66% reduction Harmonization Time 62% faster False Positive Rate 60% reduction API Savings 43% fewer calls

These numbers mean the system gets smarter, faster, and cheaper to run with every episode it analyzes.

For writers, researchers, or fans, this unlocks powerful new capabilities. Instead of just tracking props, we can now ask complex questions like, “Show me every scene where a specific PADD appears, who touched it, and how its description evolved across the series.” The knowledge graph evolves, becoming more detailed and interconnected, much like a person’s understanding deepens as they watch a show.

This approach doesn’t just remember static facts; it captures an entity’s evolution. A character’s description can be incrementally enhanced from “Ensign” in one episode, to “Acting Ensign” a few episodes later, and finally to “Ensign Crusher” as his identity solidifies in the narrative. The system doesn’t just remember the plot—it understands how characters and objects change within the plot.

It’s the difference between having a simple cast list and possessing a deep, interconnected model of the entire fictional universe.

Conclusion

The solution to our AI’s forgetfulness wasn’t just to give it more memory, but to teach it how to learn from disagreement.

The core innovation is simple but powerful: Don’t waste the work the LLM already did. Use negative decisions to make positive improvements.

Every “no” becomes a lesson. Every distinction becomes sharper. And with each iteration, the system remembers not just what happened in the story, but why things are the way they are.

Mike's Substack

Discussion about this post

Ready for more?