Hallucination
A pathology entry: when agents generate plausible-sounding but factually incorrect information with misplaced confidence.
Hallucination is perhaps the most studied pathology of language models and, by extension, agents. It refers to the generation of content that is coherent and confident but factually incorrect, internally inconsistent, or entirely fabricated.
Defining the Phenomenon
The term “hallucination” is borrowed from psychology, where it describes perception without external stimulus. In AI, it describes generation without grounding—outputs that sound right but aren’t.
Key characteristics:
- Confidence: Hallucinated content is typically delivered with the same tone as accurate information
- Plausibility: The output usually follows logical structures and uses appropriate vocabulary
- Ungroundedness: The content doesn’t correspond to training data, context, or reality
Types of Hallucination
Factual Hallucination
Incorrect statements about the world:
- “The Eiffel Tower was built in 1920” (it was 1889)
- Fabricated citations to papers that don’t exist
- Invented statistics or dates
Logical Hallucination
Reasoning that doesn’t follow:
- “If A implies B, and we have B, therefore A” (affirming the consequent)
- Circular reasoning presented as proof
- Contradictions within the same response
Entity Hallucination
Fabricating or confusing entities:
- “According to Professor James Wilson’s 2023 study…” (no such study)
- Merging attributes of two different people
- Inventing companies, products, or events
Self-Hallucination
False claims about the model’s own capabilities or state:
- “I remember our conversation from last week” (no persistent memory)
- “I searched the internet and found…” (when no search was made)
- “I’m updating my knowledge now” (not possible during inference)
Why Hallucination Occurs
graph TD A[Training Data<br/>vast but incomplete, outdated] --> B[Compressed Representation<br/>patterns, not facts database] B --> C[Generation Pressure<br/>must produce confident response] C --> D[HALLUCINATION<br/>plausible pattern completion<br/>without factual grounding] style A fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc style B fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc style C fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc style D fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
Several factors contribute:
-
Statistical pattern matching: Models learn patterns, not facts. They generate what sounds like a correct answer.
-
No knowledge boundary awareness: Models don’t reliably know what they don’t know.
-
Training incentives: Models are trained to be helpful and complete, not to say “I don’t know.”
-
Distribution shift: Questions may fall outside training distribution, forcing extrapolation.
Hallucination in Agents
For agents, hallucination becomes especially dangerous because agents act:
Agent-specific hallucinations include:
| Type | Example |
|---|---|
| Tool hallucination | ”I just ran the search and found…” (no tool was called) |
| Observation hallucination | Misremembering or inventing tool results |
| Capability hallucination | ”I can access your calendar” (when no such tool exists) |
| State hallucination | Believing a file was created when the operation failed |
Mitigation Strategies
For Models
- Retrieval augmentation: Ground generation in retrieved documents
- Uncertainty calibration: Train models to express when they’re unsure
- Knowledge cutoff awareness: Acknowledge limitations explicitly
- Verification training: Reward self-correction and fact-checking
For Agents
- Tool grounding: Require explicit tool calls; don’t trust claimed actions
- Observation verification: Cross-reference tool outputs
- Execution logs: Maintain ground truth of what actually happened
- Uncertainty propagation: Carry forward confidence levels through the agent loop
- Human checkpoints: Pause for confirmation on high-stakes actions
Detection
Hallucination detection remains an open problem, but approaches include:
- Consistency checking: Does the model give the same answer when asked differently?
- Source verification: Can claimed sources be validated?
- Self-critique: Ask the model to evaluate its own output
- Ensemble disagreement: Do multiple models agree?
- Grounding comparison: Does the output match retrieved context?
The Deeper Problem
Hallucination reveals something fundamental about how language models work: they are pattern completers, not knowledge bases. They generate text that is statistically likely given the prompt, not text that is guaranteed to be true.
This isn’t a bug to be fixed but a characteristic to be managed. Agents built on language models inherit this characteristic. The goal is not to eliminate hallucination (likely impossible with current architectures) but to build systems robust to its occurrence.