Diagram showing different memory types in an AI agent system

Memory Compression and Summarization

AGAI 203 · Memory Management and Evaluation

Learn how summarization, state extraction, and hierarchical compression help agents extend effective context without flooding prompts.

Key terms

compression = preserve signal, reduce tokenssummary ≠ source of truthstate extraction = structured memorysummary drift = compression error over time

Learning objectives

Distinguish summarization from structured state extraction.
Apply rolling and hierarchical summarization strategies.
Compress verbose tool results into useful observations.
Identify and mitigate summary drift.

Memory compression is the process of reducing large amounts of information into smaller representations that preserve what matters. Agents need compression because conversations, documents, tool results, and task traces can quickly exceed the context window.

Compression is not just making text shorter. Good compression preserves the information needed for future decisions.

Examples of compression include:

Summarizing old conversation turns
Extracting structured task state
Condensing tool logs into observations
Creating document summaries
Building hierarchical summaries over many chunks
Storing decisions and open questions separately

Summarization versus state extraction

A summary is prose. State extraction is structured.

Summary:

The user is building a local RAG prototype using Chroma and wants to evaluate retrieval quality before moving to a managed vector database.

State extraction:

{
  "project": "local RAG prototype",
  "current_vector_store": "Chroma",
  "possible_future_store": "managed vector database",
  "current_goal": "evaluate retrieval quality",
  "constraints": ["local development", "low cost"]
}

Structured state is often easier for applications to update, validate, and selectively insert into prompts. Prose summaries are easier for models to read. Many systems use both.

Rolling task summaries

Rolling summaries are useful for long conversations. After every few turns, the system updates a compact summary.

Example prompt:

Update the task summary using the new conversation turns.
Preserve:
- current goal
- user constraints
- decisions already made
- unresolved questions
- important tool results
Remove:
- small talk
- repeated explanations
- resolved side topics

Pseudocode:

def compress_conversation(model, previous_summary, new_turns):
    prompt = f"""
    Previous summary:
    {previous_summary}

    New turns:
    {new_turns}

    Produce an updated summary under 200 words.
    Keep only task-relevant information.
    """
    return model.generate(prompt)

The summary should be treated as a memory artifact that can be wrong. For important workflows, use structured state and logs in addition to generated summaries.

Hierarchical summarization

For long documents or many episodes, hierarchical summarization can help.

Chunks → chunk summaries → section summaries → document summary → corpus summary

This is useful when the system needs both high-level understanding and the ability to drill down into details.

Example:

def hierarchical_summarize(chunks):
    chunk_summaries = [summarize(chunk) for chunk in chunks]
    section_summaries = group_and_summarize(chunk_summaries)
    document_summary = summarize("\n".join(section_summaries))
    return document_summary

The risk is information loss. Each summarization layer may drop details. Preserve source links or chunk IDs so the system can retrieve original text when needed.

Compression for tool results

Agents often receive verbose tool results: logs, search results, stack traces, database records, or document extracts. These should be compressed into observations.

Raw log excerpt:

... hundreds of lines ...
ERROR tests/auth/login.test.ts:42 Expected 200 received 400
... hundreds of lines ...

Compressed observation:

{
  "tool": "run_tests",
  "success": false,
  "failed_test": "tests/auth/login.test.ts:42",
  "error_summary": "Expected login response status 200 but received 400.",
  "next_relevant_files": ["src/auth/login.ts", "tests/auth/login.test.ts"]
}

This keeps the agent focused.

What not to compress

Some information should not be summarized away:

Exact code snippets involved in a bug
Legal or policy language where wording matters
Numerical values used in calculations
User approvals or denials
Security-relevant details
Source citations
Error messages needed for debugging

For these cases, store references to original artifacts and retrieve them when needed.

Summary drift

Summary drift occurs when repeated summarization gradually changes meaning. A user’s constraint may be softened, a decision may be misremembered, or uncertainty may become false certainty.

Example drift:

Original:

We might use Pinecone later if Chroma is too limited.

Bad summary:

The user decided to use Pinecone later.

To reduce drift:

Keep structured state for decisions.
Store source references.
Use conservative wording.
Re-summarize from original logs periodically.
Allow users to correct memory.
Avoid summarizing uncertain statements as facts.

Memory compression policy

A compression policy defines what gets summarized, when, and how.

Example:

{
  "summarize_after_turns": 8,
  "max_summary_words": 200,
  "preserve_exact": ["user_constraints", "approvals", "IDs", "deadlines"],
  "store_raw_logs": true,
  "include_source_refs": true
}

Policies make compression predictable.

Practical takeaway

Compression extends effective context, but it can introduce errors. Use summaries for continuity, structured state for decisions, observations for tool results, and references for exact source material.

A good memory system does not merely shorten text. It preserves the information needed to act correctly later.

Ask your AI guide

AI Chat· Memory & Context Management — Memory Compression and Summarization

🤖

Ask anything about Memory & Context Management — Memory Compression and Summarization, or choose a suggested question below.

AI responses are educational and may not be perfectly accurate. Press Enter to send, Shift+Enter for new line.