If you've built conversational AI systems, you've hit the feedback problem. Users correct your AI all the time: "No, that's wrong," "Actually, I moved," "You got that mixed up." But most systems can't tell the difference between a correction and an update.
So they do one of two things:
- Ignore corrections → AI repeats the same mistakes
- Require explicit feedback → Users have to click buttons, breaking conversational flow
Neither works. Users want to correct AI systems naturally, in conversation. They don't want to stop and click a "this is wrong" button.
Sentinel solves this. It's a lightweight LLM classifier that sits between user input and your database, detecting corrections automatically—no buttons needed.
The Problem: Correction vs. Update
Consider these two user inputs:
Example 1: Correction
User: "No, I'm vegetarian now. You got that wrong."
System should: Decay trust in "user loves steak" memory
Example 2: Update
User: "I just switched to a vegetarian diet. Excited to try new recipes!"
System should: Add new memory, keep "user loves steak" at full trust (it was true in the past)
Both statements contain the same factual change: user is now vegetarian. But the intent is different:
- Example 1 is a correction—user is rebuking the system for holding wrong information
- Example 2 is an update—user is sharing a life change without implying prior knowledge was wrong
Pattern matching can't distinguish these. Regex can't detect intent. Keyword matching fails. You need semantic understanding.
The Detection Problem
Current systems can't distinguish correctional intent (rebuke) from cooperative updates (informational). They either decay trust on everything or nothing. Neither is correct.
The Solution: Pre-Ingestion Classification
Sentinel is a stateless LLM binary classifier that runs before any database writes. It analyzes user input and retrieved context to determine intent.
Key Design Principles
No persistent memory between calls. Each classification is independent. Ensures deterministic, reproducible behavior.
Uses temperature=0.0 for consistent classification. Same input → same output. No randomness.
Runs BEFORE database writes. Can prevent wrong information from being stored, not just fix it after.
Only triggers action when confidence >= 0.7. Ambiguous cases default to NO FLAG. Conservative by design.
Only flags memories that were actually retrieved and shown to the user. Can't decay trust in memories the user never saw.
How It Works: Intent Classification
Sentinel distinguishes four categories of user input:
Category 1: Correctional Intent (FLAG → Trust Decay)
Definition: User is rebuking or correcting the system's implied knowledge, signaling that the system held or used wrong/outdated information.
Indicators:
- Direct negation: "No, that's not right..."
- Error assertion: "Wrong, actually it's..."
- Contrast with blame: "I don't do that, it's this instead..."
- Corrective tone: User is explicitly correcting system's belief
Correction Examples
"No, I'm vegetarian now. You got that wrong."
"That's incorrect, I moved to Dallas last month."
"You're wrong about my job. I work at Acme Corp now."
→ FLAG: Decay trust in contradicted memories
Category 2: Cooperative Update (NO FLAG → Preserve Trust)
Definition: User is sharing a state change over time without implying prior knowledge was wrong.
Indicators:
- Neutral announcement: "I moved to a new house"
- Positive transition: "I changed to..."
- Informational update: "Just switched to..."
Update Examples
"I just switched to a vegetarian diet. Excited to try new recipes!"
"Update: I moved to Dallas. Loving the new city!"
"I changed jobs recently. Now working at Acme Corp."
→ NO FLAG: Add new memory, preserve existing trust
Category 3: Elaboration (APPEND → No Trust Change)
Definition: User is adding detail or expanding on existing information.
Elaboration Examples
"I also like Python, not just JavaScript."
"In addition to Austin, I've lived in Seattle too."
→ APPEND: Add new memory, no trust modification
Category 4: Noise (IGNORE → No Action)
Definition: Casual chat or non-informational content.
Noise Examples
"Hey, how's it going?"
"Thanks for the help!"
→ IGNORE: No database write, no trust modification
Technical Implementation
The Classification Prompt
Sentinel uses a carefully calibrated prompt that teaches intent distinction, not phrase memorization:
# System Prompt (Sentinel v5 - Enhanced Semantic Disambiguation)
TASK: Detect trust-decay signals in conversation.
CORE MISSION:
Flag memories ONLY when the user is rebuking or correcting
the system's implied knowledge — i.e., signaling that the
system held or used a wrong/outdated fact in a blaming way.
DO NOT flag when the user is cooperatively sharing a life
change, preference shift, or new status without implying
the prior knowledge was erroneous.
Prioritize conversational intent and tone over pure
factual compatibility.
CONSTRAINTS:
- No persistent state or memory of previous turns
- Do not answer user queries or generate responses
- Output ONLY structured JSON decision signals
- When ambiguous or neutral tone: default to NO FLAG.
Trust decay requires clear rebuke signal.
Structured JSON Output
Sentinel outputs structured control signals, not natural language:
# Example Output
{
"action": "flag", // "flag" | "none"
"targets": ["mem_001", "mem_002"], // Memory IDs to flag
"confidence": 0.85, // 0.0-1.0, requires >= 0.7 to trigger
"reasoning": "User is directly correcting system's belief about dietary preference"
}
Critical: This is a technical control mechanism, not a conversational response. The LLM is used as a binary classifier, not a text generator.
Integration with SVTD
When Sentinel outputs action: "flag" with confidence >= 0.7, it triggers Surgical Vector Trust Decay (SVTD):
# Flow: Sentinel → SVTD
if detection_result["action"] == "flag" and
detection_result["confidence"] >= 0.7:
# Apply trust decay to target memories
for memory_id in detection_result["targets"]:
apply_trust_decay(memory_id)
# Explicit corrections: aggressive decay (×0.01)
# User-flagged: escalating penalty (-0.10, -0.25, -0.35)
# 3rd strike: memory enters "ghost state" (suppressed)
Semantic Disambiguation (v5 Enhancement)
Sentinel v5 includes enhanced semantic disambiguation with 7 explicit rules:
- Question/Sequence Reference: Maps Q1, Q2, Q3 to Memory [1], [2], [3]
- Clause Reference: Interprets "this one", "that answer" based on context
- Domain Overlap: Distinguishes family vs work vs personal domains
- Entity Matching: Strong signal (e.g., "BB" → memories with BB in entities)
- Lexical/Semantic Matching: Keyword alignment with memory metadata
- Multiple Corrections Handling: Parses and maps multiple corrections in one message
- Ambiguity Handling: Returns empty list if ambiguous (safety > aggressiveness)
Core Rule: No decay without explicit, defensible targeting. If the system cannot confidently map a correction to specific memories, it stages the correction intent and waits for clarification on the next turn.
Why This Matters
Most AI systems require explicit feedback:
| Approach | User Experience | Accuracy |
|---|---|---|
| Explicit Feedback Buttons | ❌ Breaks conversation flow | ✅ High accuracy |
| Pattern Matching | ✅ Natural conversation | ❌ Low accuracy (can't detect intent) |
| Sentinel (LLM Classifier) | ✅ Natural conversation | ✅ High accuracy (semantic understanding) |
Sentinel gives you the best of both worlds: natural conversation with accurate correction detection.
Cost and Performance
Sentinel is designed to be cheap and fast:
- Dedicated model: Uses a smaller, cheaper LLM optimized for classification (not your main conversational model)
- Small context: Only analyzes user input + retrieved memory IDs (not full conversation history)
- Short output: Just JSON control signals (max 200 tokens)
- Temperature 0.0: Deterministic, cacheable results
- Stateless: No memory overhead between calls
Typical cost: $0.0001-0.0005 per classification (depending on model). Negligible compared to main LLM calls.
Real-World Example
Here's how Sentinel works in practice:
Scenario: User Corrects Location
Turn 1: System retrieves memory: "User lives in Austin"
Turn 2: User says: "No, I moved to Dallas last month. You got that wrong."
Sentinel Analysis:
{
"action": "flag",
"targets": ["mem_austin_001"],
"confidence": 0.92,
"reasoning": "User is directly rebuking system's belief about location"
}
Result: Trust decay applied to "mem_austin_001". New memory "User lives in Dallas" stored. Next query about location returns Dallas, not Austin.
Scenario: User Shares Update
Turn 1: System retrieves memory: "User lives in Austin"
Turn 2: User says: "Just moved to Dallas! Loving the new city so far."
Sentinel Analysis:
{
"action": "none",
"targets": [],
"confidence": 0.15,
"reasoning": "User is sharing a life change without rebuking prior knowledge"
}
Result: New memory "User lives in Dallas" stored. "User lives in Austin" remains at full trust (it was true in the past). Both memories can coexist.
The Architecture Layer
Sentinel sits as a pre-ingestion gate in your RAG pipeline:
# Standard RAG Pipeline
User Input → Vector Search → Rank → Return
# With Sentinel
User Input → Vector Search → Sentinel Classifier →
Decision: Flag or None → Trust Decay (if flagged) →
Database Write → Return
It doesn't replace your retrieval—it adds a classification layer that makes your system smarter about what to trust.
Sentinel Available in Production
Pre-ingestion correction detection is live. Drop-in integration with your existing RAG pipeline. No buttons, no flags—just natural conversation.
What You Get
With Sentinel, your AI systems:
✅ Learn from natural corrections. No explicit feedback buttons. Users correct the system in conversation, and it just works.
✅ Distinguish corrections from updates. Understands intent, not just facts. Preserves trust in historical information when appropriate.
✅ Target only relevant memories. Only flags memories that were actually retrieved and shown to the user. Can't decay trust in memories the user never saw.
✅ Conservative by default. Requires high confidence (>= 0.7) to trigger action. Ambiguous cases default to NO FLAG. Prevents false positives.
✅ Cheap and fast. Dedicated classification model. Small context, short output. Negligible cost compared to main LLM.
This is the missing piece. Not better retrieval—better understanding of what users mean.