Capability Tiers

A developmental taxonomy of agent capabilities—from basic language to autonomous self-improvement, mapping the stages of cognitive maturity.

In developmental psychology, cognitive capabilities emerge in stages. Infants perceive, then act, then plan, then reason abstractly. Each stage builds on the previous, and each unlocks new possibilities.

Capability tiers provide an analogous framework for AI agents—a taxonomy of what agents can do, organized by increasing sophistication and the dependencies between capabilities.

The Capability Ladder

Agent capabilities form a rough hierarchy:

┌─────────────────────────────────────────────────────────┐
│                                                          │
│  TIER 5: SELF-IMPROVEMENT                               │
│          Modify own capabilities, learn continuously     │
│                         ▲                                │
│                         │                                │
│  TIER 4: AUTONOMOUS PLANNING                            │
│          Long-horizon goals, strategy, adaptation        │
│                         ▲                                │
│                         │                                │
│  TIER 3: MULTI-STEP EXECUTION                           │
│          Tool use, action sequences, error recovery      │
│                         ▲                                │
│                         │                                │
│  TIER 2: CONTEXTUAL REASONING                           │
│          Chain-of-thought, in-context learning           │
│                         ▲                                │
│                         │                                │
│  TIER 1: LANGUAGE COMPETENCE                            │
│          Generation, comprehension, knowledge recall     │
│                                                          │
└─────────────────────────────────────────────────────────┘
The capability tier hierarchy

Each tier requires the capabilities below it. You can’t have autonomous planning without multi-step execution; you can’t have multi-step execution without reasoning.

Tier 1: Language Competence

The foundation. The agent can process and generate natural language.

┌─────────────────────────────────────────────────────────┐
│                  TIER 1: LANGUAGE                        │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Core Capabilities:                                      │
│  ─────────────────                                       │
│  • Text comprehension (understanding input)              │
│  • Text generation (producing output)                    │
│  • Knowledge recall (facts from training)                │
│  • Style adaptation (matching tone, format)              │
│  • Translation (between languages)                       │
│                                                          │
│  Limitations:                                            │
│  ────────────                                            │
│  • Single-turn only                                      │
│  • No explicit reasoning                                 │
│  • No action capability                                  │
│  • Knowledge frozen at training                          │
│                                                          │
│  Example Systems:                                        │
│  ───────────────                                         │
│  • Base GPT-3 (before RLHF)                              │
│  • Simple completion APIs                                │
│  • Translation services                                  │
│                                                          │
└─────────────────────────────────────────────────────────┘
Tier 1 capabilities

Anthropological parallel: The infant who understands and produces speech but cannot yet reason about it or use it to accomplish goals.

Tier 1 is necessary but not sufficient for agency. A system at this tier can answer questions but not do anything. It’s a tool, not an agent.

Tier 2: Contextual Reasoning

The agent can think step-by-step, learn from examples in context, and maintain coherent reasoning across a conversation.

┌─────────────────────────────────────────────────────────┐
│                TIER 2: REASONING                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Core Capabilities:                                      │
│  ─────────────────                                       │
│  • Chain-of-thought reasoning                            │
│  • In-context learning (few-shot)                        │
│  • Multi-turn conversation coherence                     │
│  • Self-correction within conversation                   │
│  • Instruction following                                 │
│                                                          │
│  Limitations:                                            │
│  ────────────                                            │
│  • Cannot take external actions                          │
│  • Reasoning is verbal only                              │
│  • No access to current information                      │
│  • Limited by context window                             │
│                                                          │
│  Example Systems:                                        │
│  ───────────────                                         │
│  • ChatGPT (without plugins)                             │
│  • Claude (without tools)                                │
│  • Conversational assistants                             │
│                                                          │
└─────────────────────────────────────────────────────────┘
Tier 2 capabilities

Anthropological parallel: The child who can reason about problems verbally, explain their thinking, and learn from demonstration—but who cannot yet act independently in the world.

Tier 2 introduced the possibility of complex problem-solving, but the agent remains confined to language. It can plan in theory but not execute in practice.

The Tier 2 Breakthrough

The emergence of robust Tier 2 capabilities (around 2022-2023) was the critical enabler for everything that followed:

CapabilityWhy It Matters
Chain-of-thoughtEnables complex reasoning
Instruction followingEnables reliable control
In-context learningEnables rapid adaptation
Multi-turn coherenceEnables extended interaction

Without Tier 2, higher tiers are impossible. The ability to reason is prerequisite to the ability to plan actions.

Tier 3: Multi-Step Execution

The agent can take actions in the world—calling tools, modifying files, interacting with systems—and chain these actions together to accomplish goals.

┌─────────────────────────────────────────────────────────┐
│                TIER 3: EXECUTION                         │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Core Capabilities:                                      │
│  ─────────────────                                       │
│  • Tool use (function calling)                           │
│  • Action sequencing (multi-step tasks)                  │
│  • Observation processing (reading tool outputs)         │
│  • Error detection and recovery                          │
│  • State tracking across steps                           │
│                                                          │
│  Limitations:                                            │
│  ────────────                                            │
│  • Requires human-defined goals                          │
│  • Limited planning horizon                              │
│  • Struggles with novel situations                       │
│  • Cannot modify own capabilities                        │
│                                                          │
│  Example Systems:                                        │
│  ───────────────                                         │
│  • ChatGPT with plugins                                  │
│  • Claude with tools                                     │
│  • GitHub Copilot Workspace                              │
│  • Most current "AI agents"                              │
│                                                          │
└─────────────────────────────────────────────────────────┘
Tier 3 capabilities

Anthropological parallel: The adolescent who can execute plans, use tools, and accomplish tasks—but who still operates within structures set by others.

Tier 3 Subtypes

Not all Tier 3 agents are equal:

SubtypeDescription
3a: Single-toolCan use one tool per step
3b: Multi-toolCan orchestrate multiple tools
3c: Self-correctingCan detect and recover from errors
3d: AdaptiveCan adjust approach based on feedback

Most current agents are 3a or 3b. True 3c and 3d remain challenging.

Tier 4: Autonomous Planning

The agent can set its own subgoals, develop strategies, and adapt plans based on changing circumstances—operating over extended time horizons with minimal human guidance.

┌─────────────────────────────────────────────────────────┐
│                TIER 4: PLANNING                          │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Core Capabilities:                                      │
│  ─────────────────                                       │
│  • Goal decomposition                                    │
│  • Strategic planning                                    │
│  • Resource allocation                                   │
│  • Contingency handling                                  │
│  • Long-horizon task management                          │
│  • Self-directed exploration                             │
│                                                          │
│  Limitations:                                            │
│  ────────────                                            │
│  • Cannot acquire new capabilities                       │
│  • Bounded by training distribution                      │
│  • Still requires ultimate human oversight               │
│  • May pursue goals misaligned with intent               │
│                                                          │
│  Example Systems:                                        │
│  ───────────────                                         │
│  • AutoGPT (attempted)                                   │
│  • Devin (claimed)                                       │
│  • Research agents (emerging)                            │
│                                                          │
└─────────────────────────────────────────────────────────┘
Tier 4 capabilities

Anthropological parallel: The adult who can manage projects, set their own objectives, and navigate complex situations—but who works within a given set of skills and tools.

The Tier 4 Gap

There’s a significant capability gap between current Tier 3 systems and robust Tier 4:

ChallengeWhy It’s Hard
Long-horizon planningRequires accurate world models
Strategy adaptationRequires meta-cognition about plans
Resource managementRequires understanding of constraints
Failure recoveryRequires diagnosis of root causes

Tier 5: Self-Improvement

The agent can modify its own capabilities, learn continuously from experience, and expand what it’s able to do.

┌─────────────────────────────────────────────────────────┐
│              TIER 5: SELF-IMPROVEMENT                    │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Core Capabilities:                                      │
│  ─────────────────                                       │
│  • Continuous learning from experience                   │
│  • Capability acquisition                                │
│  • Self-modification                                     │
│  • Meta-learning (learning to learn)                     │
│  • Novel tool creation                                   │
│                                                          │
│  Limitations:                                            │
│  ────────────                                            │
│  • ???                                                   │
│  • (Largely theoretical at this point)                   │
│                                                          │
│  Example Systems:                                        │
│  ───────────────                                         │
│  • None reliably demonstrated                            │
│  • (Various research prototypes)                         │
│                                                          │
└─────────────────────────────────────────────────────────┘
Tier 5 capabilities

Anthropological parallel: The sage who not only acts wisely but continuously develops new wisdom—growing beyond what they were taught.

Tier 5 is largely theoretical. Current systems can be updated by their developers, but they cannot reliably improve themselves. This tier represents the frontier—and the source of both greatest promise and greatest concern.

Why Tier 5 Matters

Self-improvement creates feedback loops:

Better capabilities → Better self-improvement → Even better capabilities → ...

This recursive dynamic is what makes Tier 5 qualitatively different from lower tiers—and why it receives so much attention in AI safety research.

Capability Assessment

How do you determine an agent’s tier?

                  ┌─────────────────┐
                  │ Can it modify   │
                  │ its own         │──── Yes ───► TIER 5
                  │ capabilities?   │
                  └────────┬────────┘
                           │ No
                           ▼
                  ┌─────────────────┐
                  │ Can it plan     │
                  │ autonomously    │──── Yes ───► TIER 4
                  │ over long       │
                  │ horizons?       │
                  └────────┬────────┘
                           │ No
                           ▼
                  ┌─────────────────┐
                  │ Can it use      │
                  │ tools and       │──── Yes ───► TIER 3
                  │ execute multi-  │
                  │ step actions?   │
                  └────────┬────────┘
                           │ No
                           ▼
                  ┌─────────────────┐
                  │ Can it reason   │
                  │ step-by-step    │──── Yes ───► TIER 2
                  │ and follow      │
                  │ instructions?   │
                  └────────┬────────┘
                           │ No
                           ▼
                       TIER 1
Capability tier assessment flowchart

Assessment Caveats

  • Tiers aren’t binary: An agent might be “Tier 3.5”—reliable multi-step execution with some planning ability
  • Context matters: The same agent might operate at different tiers in different domains
  • Reliability varies: Occasional capability ≠ reliable capability

The Capability-Safety Tradeoff

Higher tiers bring greater capability—and greater risk:

TierCapabilityRisk Profile
1Text generationLow (words only)
2ReasoningLow (still just words)
3ActionMedium (can affect world)
4Autonomous planningHigh (less oversight)
5Self-improvementUnknown (recursive effects)

Tier Transitions

What enables jumps between tiers?

TransitionKey Enablers
1 → 2Scale, RLHF, instruction tuning
2 → 3Function calling, tool frameworks, scaffolding
3 → 4Better planning algorithms, longer context, world models
4 → 5??? (active research)

We understand the 1→2 and 2→3 transitions well. The 3→4 transition is partially understood. The 4→5 transition remains mysterious.

Implications for Deployment

Capability tier should inform deployment decisions:

TierAppropriate AutonomyOversight Required
1-2None (tool only)Minimal
3Task-scopedHuman in loop
4Goal-scopedMonitoring, intervention capability
5??????

Deploying high-tier agents with low-tier oversight is a recipe for problems.

The Path Forward

The field is currently focused on making Tier 3 reliable before pushing hard on Tier 4:

  • Better tool use: More reliable, broader capability
  • Better error handling: Graceful recovery from failures
  • Better state management: Coherent operation over many steps
  • Better evaluation: Knowing when Tier 3 is truly solid

Only with a solid Tier 3 foundation can Tier 4 be approached responsibly.

See Also