Memory Architectures

Memory architecture determines what an agent can remember, retrieve, and reuse. Without memory, an agent is limited to the current prompt, tool results, and conversation history. With memory, it can maintain state across steps, recall prior interactions, use stored knowledge, and improve continuity.

Memory is not one thing. Different tasks need different memory types.

Common categories include:

Working memory: temporary state for the current task.
Episodic memory: records of past interactions or events.
Semantic memory: durable facts, concepts, and documents.
Procedural memory: reusable procedures, workflows, and skills.

Good memory design is selective. Storing everything is rarely the right answer.

Working memory

Working memory exists inside the current task. It tracks what the agent is doing now.

Example:

{
  "goal": "Compare three API gateway options",
  "completed_steps": ["identified candidates", "collected pricing"],
  "open_questions": ["confirm enterprise support options"],
  "current_summary": "Option A is cheapest, Option B has strongest enterprise controls."
}

Working memory is useful for ReAct loops, plan execution, and long tool traces. It can be stored in the prompt, in application state, or in a framework state object.

LangGraph-style systems often make state explicit so each node can read and update it.

Episodic memory

Episodic memory stores what happened. For an agent, that may include previous user requests, completed tasks, tool calls, decisions, or feedback.

Example:

{
  "event_type": "support_interaction",
  "date": "2026-06-04",
  "summary": "User asked about delayed order ORD-7711. Agent created refund request draft.",
  "outcome": "approved by user"
}

Episodic memory helps with continuity. A support agent can avoid asking the same question twice. A coding agent can remember that a previous approach failed.

However, episodic memory can create privacy and staleness risks. Users should understand what is stored, and sensitive details should be minimized.

Semantic memory

Semantic memory stores knowledge. This often takes the form of documents, embeddings, knowledge bases, or structured facts.

A retrieval-augmented generation system is a semantic memory architecture:

User question
  ↓
Embed query
  ↓
Retrieve relevant chunks
  ↓
Insert chunks into context
  ↓
Generate grounded answer

Semantic memory is ideal for documentation assistants, policy bots, research tools, and enterprise knowledge systems.

The main challenge is retrieval quality. The agent can only use what it retrieves. Poor chunking, stale documents, weak metadata, or irrelevant results can degrade answers.

Procedural memory

Procedural memory stores how to do things. This may be represented as playbooks, task templates, workflows, or learned routines.

Example:

{
  "procedure": "triage_failed_deployment",
  "steps": [
    "Check deployment status",
    "Read latest error logs",
    "Compare failing commit to previous successful commit",
    "Run rollback decision policy",
    "Escalate if customer impact is high"
  ]
}

Procedural memory is useful for operations agents, coding agents, and business process automation. It should be versioned and reviewed because a bad procedure can cause repeated failures.

Memory storage options

Memory can be stored in several places:

Prompt context: simple, immediate, limited by token window
Database: durable, structured, queryable
Vector store: good for semantic retrieval
File store: useful for documents and artifacts
Cache: fast, temporary, task-specific
State graph: explicit workflow state

The storage choice depends on access patterns. If the agent needs exact lookup, use a database. If it needs semantic similarity, use a vector store. If it needs temporary task state, use workflow state or cache.

Memory retrieval policy

Long-term memory is only useful if the agent retrieves the right memory at the right time. A memory retrieval policy should define:

What triggers retrieval?
Which memory stores are searched?
How many results are included?
How freshness is handled?
How conflicts are resolved?
What memory is excluded for privacy or safety?

Example policy:

For company policy questions, retrieve only approved policy documents modified within the last 18 months unless the user explicitly asks for archive history.

This prevents outdated memories from polluting answers.

Choosing a memory architecture

Use working memory for multi-step tasks. Use semantic memory for large knowledge bases. Use episodic memory when continuity across interactions matters. Use procedural memory when the agent repeats workflows.

Avoid persistent memory when:

The information is sensitive and unnecessary.
The task is one-off.
The memory is likely to become stale quickly.
You cannot give users inspection or deletion controls.

Practical takeaway

Memory makes agents more capable, but also more complex. The best memory architecture is not the largest one. It is the one that stores the right information, retrieves it at the right time, and keeps it fresh, inspectable, and safe.

For production agents, memory should be designed like a database-backed feature, not like a vague human trait.

Key terms

Learning objectives

Working memory

Episodic memory

Semantic memory

Procedural memory

Memory storage options

Memory retrieval policy

Choosing a memory architecture

Practical takeaway

Ask your AI guide