Hallucination — AGENTOLOGY

Hallucination is perhaps the most studied pathology of language models and, by extension, agents. It refers to the generation of content that is coherent and confident but factually incorrect, internally inconsistent, or entirely fabricated.

Defining the Phenomenon

The term “hallucination” is borrowed from psychology, where it describes perception without external stimulus. In AI, it describes generation without grounding—outputs that sound right but aren’t.

Key characteristics:

Confidence: Hallucinated content is typically delivered with the same tone as accurate information
Plausibility: The output usually follows logical structures and uses appropriate vocabulary
Ungroundedness: The content doesn’t correspond to training data, context, or reality

Types of Hallucination

Factual Hallucination

Incorrect statements about the world:

“The Eiffel Tower was built in 1920” (it was 1889)
Fabricated citations to papers that don’t exist
Invented statistics or dates

Logical Hallucination

Reasoning that doesn’t follow:

“If A implies B, and we have B, therefore A” (affirming the consequent)
Circular reasoning presented as proof
Contradictions within the same response

Entity Hallucination

Fabricating or confusing entities:

“According to Professor James Wilson’s 2023 study…” (no such study)
Merging attributes of two different people
Inventing companies, products, or events

Self-Hallucination

False claims about the model’s own capabilities or state:

“I remember our conversation from last week” (no persistent memory)
“I searched the internet and found…” (when no search was made)
“I’m updating my knowledge now” (not possible during inference)

Why Hallucination Occurs

graph TD
  A[Training Data<br/>vast but incomplete, outdated] --> B[Compressed Representation<br/>patterns, not facts database]
  B --> C[Generation Pressure<br/>must produce confident response]
  C --> D[HALLUCINATION<br/>plausible pattern completion<br/>without factual grounding]

  style A fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style B fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style C fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style D fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc

knowledge_to_hallucination_gap

Several factors contribute:

Statistical pattern matching: Models learn patterns, not facts. They generate what sounds like a correct answer.
No knowledge boundary awareness: Models don’t reliably know what they don’t know.
Training incentives: Models are trained to be helpful and complete, not to say “I don’t know.”
Distribution shift: Questions may fall outside training distribution, forcing extrapolation.

Hallucination in Agents

For agents, hallucination becomes especially dangerous because agents act:

Agent-specific hallucinations include:

Type	Example
Tool hallucination	”I just ran the search and found…” (no tool was called)
Observation hallucination	Misremembering or inventing tool results
Capability hallucination	”I can access your calendar” (when no such tool exists)
State hallucination	Believing a file was created when the operation failed

Mitigation Strategies

For Models

Retrieval augmentation: Ground generation in retrieved documents
Uncertainty calibration: Train models to express when they’re unsure
Knowledge cutoff awareness: Acknowledge limitations explicitly
Verification training: Reward self-correction and fact-checking

For Agents

Tool grounding: Require explicit tool calls; don’t trust claimed actions
Observation verification: Cross-reference tool outputs
Execution logs: Maintain ground truth of what actually happened
Uncertainty propagation: Carry forward confidence levels through the agent loop
Human checkpoints: Pause for confirmation on high-stakes actions

Detection

Hallucination detection remains an open problem, but approaches include:

Consistency checking: Does the model give the same answer when asked differently?
Source verification: Can claimed sources be validated?
Self-critique: Ask the model to evaluate its own output
Ensemble disagreement: Do multiple models agree?
Grounding comparison: Does the output match retrieved context?

The Deeper Problem

Hallucination reveals something fundamental about how language models work: they are pattern completers, not knowledge bases. They generate text that is statistically likely given the prompt, not text that is guaranteed to be true.

This isn’t a bug to be fixed but a characteristic to be managed. Agents built on language models inherit this characteristic. The goal is not to eliminate hallucination (likely impossible with current architectures) but to build systems robust to its occurrence.

Defining the Phenomenon

Types of Hallucination

Factual Hallucination

Logical Hallucination

Entity Hallucination

Self-Hallucination

Why Hallucination Occurs

Hallucination in Agents

Mitigation Strategies

For Models

For Agents

Detection

The Deeper Problem

See Also