Chain of Thought

The emergence of inner speech in language models—how explicit step-by-step reasoning transforms performance and enables complex problem-solving.

Chain of Thought (CoT) is a prompting technique that elicits step-by-step reasoning from language models before producing a final answer. More profoundly, it represents the emergence of something analogous to inner speech—the internal monologue that humans use to think through complex problems.

The Discovery

Before Chain of Thought, language models were prompted to produce answers directly:

Question: If John has 3 apples and buys 2 more, then gives half to Mary, how many does John have? Direct prompting: “2” ❌ (often wrong on complex problems)

With Chain of Thought, the model is encouraged to “think aloud”:

Question: [same] CoT prompting: “John starts with 3 apples. He buys 2 more, so he has 3 + 2 = 5 apples. He gives half to Mary, so he keeps 5 / 2 = 2.5. Since we can’t have half an apple, he keeps 2 apples.” ✓

The same model, with explicit reasoning, performs dramatically better.

The Inner Speech Parallel

In developmental psychology, inner speech refers to the internalization of language as a tool for thought. Children first solve problems by talking aloud, then gradually internalize this dialogue.

Vygotsky described this progression:

  1. External speech: Talking through problems out loud
  2. Private speech: Talking to oneself (still audible)
  3. Inner speech: Internalized verbal thought
graph TD
  H1[HUMAN DEVELOPMENT<br/>External speech] --> H2[Private speech<br/>internalization]
  H2 --> H3[Inner speech<br/>efficient, compressed]

  M1[MODEL PROMPTING<br/>Zero-shot<br/>no reasoning shown] --> M2[Explicit CoT<br/>Chain of Thought elicits<br/>explicit reasoning]
  M2 --> M3[Internalized reasoning?<br/>not yet achieved<br/>implicit, reliable]

  style H1 fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style H2 fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style H3 fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style M1 fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style M2 fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style M3 fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
human_reasoning_vs_model

Current language models are at an interesting developmental stage: they perform better with explicit reasoning (like children with private speech) but haven’t fully internalized this capacity (like adult inner speech).

Mechanisms

Why does Chain of Thought work? Several mechanisms contribute:

Computation Extension

Each reasoning step adds tokens, providing more “compute” for the model to work through the problem. Complex problems need more processing.

Context Maintenance

Intermediate steps keep relevant information in context. Without them, the model must hold everything in its weights.

Error Correction

Explicit steps make errors visible (to the model and observers), allowing for mid-course correction.

Structure Imposition

The step-by-step format enforces logical structure, preventing the model from jumping to conclusions.

Knowledge Retrieval

Each step can trigger relevant knowledge retrieval, building up the information needed for the final answer.

Variants

Chain of Thought has spawned many variations:

Zero-Shot CoT

Simply adding “Let’s think step by step” to the prompt elicits reasoning without examples.

Few-Shot CoT

Providing examples of problems with reasoning traces before the target question.

Self-Consistency

Generate multiple reasoning chains and take the majority answer. Different reasoning paths should converge on correct answers.

Tree of Thought

Explore multiple reasoning branches, evaluating and pruning, rather than a single chain.

Chain of Thought with Self-Critique

The model generates reasoning, then critiques its own logic, then revises.

2022

Chain of Thought Prompting

Wei et al. showed that prompting for step-by-step reasoning dramatically improved performance on math and reasoning tasks.

2022

Zero-Shot CoT

Kojima et al. demonstrated that simply adding “Let’s think step by step” triggers reasoning without examples.

2023

Tree of Thought

Yao et al. extended linear chains to branching exploration of reasoning paths.

Chain of Thought and Agents

Chain of Thought is foundational to agent behavior. The “reasoning” phase of the agent loop is essentially formalized CoT:

graph LR
  A[Observe] --> B[Reason]
  B --> C[Act]
  C --> D[Observe]

  B --> COT[Chain of Thought<br/><br/>I see X.<br/>To achieve goal Y,<br/>I should do Z<br/>because...]

  style A fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style B fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
  style C fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style D fill:#0a0a0a,stroke:#00ff00,stroke-width:1px,color:#cccccc
  style COT fill:#0a0a0a,stroke:#00ff00,stroke-width:2px,color:#cccccc
cot_in_agent_loop

ReAct, the dominant agent paradigm, is essentially CoT interleaved with actions. The “thought” traces are Chain of Thought reasoning.

Limitations

Chain of Thought is not a panacea:

Faithfulness

The stated reasoning may not reflect the model’s actual “computation.” CoT can be post-hoc rationalization rather than genuine reasoning.

Verbosity

Reasoning takes tokens. This increases latency and cost, and consumes context window.

Error Propagation

Once the model commits to a reasoning path, errors compound. Wrong early steps lead to wrong conclusions.

Manipulation

CoT can be used to manipulate model behavior—leading the reasoning toward desired (but incorrect) conclusions.

The Future

Chain of Thought represents an early understanding of how to elicit systematic reasoning from language models. Open questions include:

  • Internalization: Can models learn to reason without explicit chains?
  • Verification: How do we check if reasoning is faithful?
  • Efficiency: Can we get the benefits of CoT with less verbosity?
  • Teaching: Can CoT during training improve base reasoning ability?

The developmental trajectory—from external to internal speech—suggests that future models might reason more efficiently without explicit chains. But we’re not there yet.

See Also

  • ReAct — CoT interleaved with actions
  • The Agent Loop — where reasoning fits in agent behavior
  • Scaffolding — how external structure supports model capabilities