Diagram showing different memory types in an AI agent system

In-Context Memory Strategies

AGAI 203 · Context Window and Memory Foundations

Learn practical strategies for managing conversation history, task summaries, tool results, and active constraints inside the model context.

Key terms

in-context memory = selected prompt statesummary compresses historyrelevance beats recencytool output → concise observation

Learning objectives

  • Apply strategies for managing long conversation history.
  • Create rolling summaries that preserve task-relevant information.
  • Compress tool outputs into concise observations.
  • Track active constraints separately from raw chat history.

In-context memory is the information placed directly into the prompt. It is the fastest and most direct form of memory because the model can use it immediately. But it is also the most limited and expensive. Good in-context memory design is about selection.

The central question is:

What must the model see right now to make the next correct decision?

For agents, the answer is rarely “everything.” Instead, the prompt should contain the current goal, active constraints, relevant recent messages, important tool results, and a compact state summary.

Conversation history management

A simple chatbot may append every message to the prompt. This works for short conversations but fails for long ones.

Problems with full-history prompting:

  • Old details consume tokens.
  • Stale constraints may conflict with newer instructions.
  • The model may focus on irrelevant early messages.
  • Sensitive details may be repeated unnecessarily.
  • Long prompts increase cost and latency.

A better approach is history selection.

Example policy:

Include:
- The latest user request
- The last 4 conversation turns
- Any unresolved constraints
- Any explicit user preferences relevant to this task
- A summary of older task-relevant context

Exclude:
- Resolved side discussions
- Old tool logs unless needed
- Repeated confirmations
- Irrelevant personal details

Rolling summaries

A rolling summary compresses earlier conversation into a concise state representation.

Example:

Conversation summary:
The user is building a RAG assistant for internal product documentation. They prefer Python examples. They have chosen Chroma for local prototyping but may move to Pinecone later. Current question: how to evaluate retrieval quality.

This summary can replace dozens of older turns. It should be updated when important new information appears.

Pseudocode:

def update_summary(model, old_summary, new_messages):
    prompt = f"""
    Update the conversation summary using the new messages.
    Keep stable task goals, decisions, constraints, and unresolved questions.
    Remove resolved details and small talk.

    Old summary:
    {old_summary}

    New messages:
    {new_messages}
    """
    return model.generate(prompt)

Summaries must be checked occasionally. A bad summary can distort the entire future conversation.

Tool result management

Tool results can be large. A web search result, stack trace, PDF extraction, or database query may contain far more information than the model needs.

Instead of inserting raw output, transform it into a useful observation.

Raw result:

5000 lines of test output...

Better observation:

{
  "command": "npm test -- login",
  "success": false,
  "summary": "One login test failed.",
  "relevant_error": "Expected status 200 but received 400 at tests/login.test.ts:42",
  "likely_next_step": "Inspect login request payload and route handler."
}

This gives the model actionable information without flooding context.

Active constraint tracking

Agents often fail when they forget constraints. For example:

Use Python, not JavaScript.
Do not modify production data.
Return JSON only.
Assume the user is using Windows.

Active constraints should be stored separately from conversation text and inserted into context when relevant.

Example:

{
  "active_constraints": [
    "Use Python examples unless the user asks otherwise.",
    "Do not recommend paid services unless free options are insufficient.",
    "Return machine-readable JSON for extraction tasks."
  ]
}

This is more reliable than hoping the model remembers a constraint from 20 turns ago.

Recency versus relevance

Recent messages are often relevant, but not always. A message from 30 turns ago may contain an important requirement. A message from one turn ago may be small talk.

A context manager should consider both recency and relevance.

Simple scoring approach:

def score_message(message, current_task):
    score = 0
    if message.is_recent:
        score += 2
    if message.contains_user_constraint:
        score += 5
    if message.semantic_similarity(current_task) > 0.75:
        score += 4
    if message.is_resolved_side_topic:
        score -= 3
    return score

This can be implemented with heuristics, embeddings, metadata, or a model-based selector.

Prompt structure for in-context memory

A clean structure helps the model use memory correctly.

Example:

System instructions:
You are a technical assistant...

Current task state:
Goal: Build a basic RAG pipeline.
User constraints: Python, local development, low cost.
Completed decisions: Use Chroma for prototype.
Open question: How to evaluate retrieval quality.

Relevant history summary:
...

Retrieved context:
...

User request:
Explain how to measure retrieval precision and recall.

Separating sections reduces ambiguity.

Practical takeaway

In-context memory is precious. Use it for what the model needs now: current task state, recent relevant conversation, active constraints, and concise observations.

Do not rely on raw conversation history as your memory strategy. Use summaries, state objects, relevance selection, and tool-output compression to keep the model focused.

Sign in to track your progress.

Up next · Module 2

External Memory and RAG

Learn how agents use external memory systems to retrieve facts, documents, and past knowledge. This module explains embeddings, vector databases, semantic search, and how to build a retrieval-augmented generation pipeline from scratch.

Ask your AI guide

AI Chat· Memory & Context Management — In-Context Memory Strategies
🤖

Ask anything about Memory & Context Management — In-Context Memory Strategies, or choose a suggested question below.

AI responses are educational and may not be perfectly accurate. Press Enter to send, Shift+Enter for new line.