AgentUse - AI Agents as Simple as Markdown

Overview

AgentUse automatically manages conversation context to keep your agents running efficiently within token limits. This ensures long-running agents don’t hit token limits while preserving important information.

Automatic Compaction

When conversations approach token limits, AgentUse automatically:

Detects when context usage exceeds the threshold
Compacts older messages into a concise summary
Preserves recent messages for continuity
Continues the conversation seamlessly

Compaction happens transparently - your agent continues working without interruption.

How It Works

Token Tracking

AgentUse tracks token usage throughout the conversation and compacts when approaching limits. The system uses character-based estimation (approximately 4 characters per token) and updates with actual usage data from the AI models.

Compaction Strategy

AgentUse uses a summarization-based approach:

Older messages are summarized using the same model as the agent
Recent messages are preserved intact for continuity
The summary captures key decisions, tool results, and progress
System prompts and tool definitions are always preserved

Configuration

Environment Variables

Control context management globally using environment variables:

# Enable/disable context compaction (default: enabled)
CONTEXT_COMPACTION=true

# Set compaction threshold as percentage (default: 0.7 = 70%)
COMPACTION_THRESHOLD=0.8

# Number of recent messages to preserve (default: 3)
COMPACTION_KEEP_RECENT=5

# Maximum agent steps before stopping (default: 1000)
MAX_STEPS=2000

What Gets Preserved

Always Preserved

System prompts
Recent messages (last 3 by default, configurable)
Tool definitions remain available

What Gets Compacted

Older conversation history
Previous tool calls and results
Assistant responses from earlier in the conversation

The compaction creates a summary that preserves:

Key decisions and outcomes
Important tool results and errors
Current state and progress
Critical information needed for continuation

Compaction Details

Compaction Process

When compaction is triggered:

Older messages are separated from recent ones
A summarization request is sent to the same model
The summary is created with a specialized system prompt
Recent messages are kept intact for continuity
The conversation continues with the compacted context

Compaction Prompt

The system uses this prompt for summarization:

You are a conversation summarizer. Summarize the following agent context concisely, preserving:
1. Key decisions and outcomes
2. Important tool results and errors
3. Current state and progress
4. Any critical information needed for continuation

Be concise but comprehensive.

Monitoring

Token Usage

AgentUse displays token usage at the end of each run:

Tokens used: 12,450

For agents with sub-agents:

Tokens used: 18,750 (main: 12,450, sub-agents: 6,300)

Compaction Events

When compaction occurs, you’ll see:

Context approaching limit (85% used). Compacting agent context...
Context compacted successfully. Continuing...

Context Limits by Model

AgentUse automatically detects context limits for different models:

Anthropic models: Retrieved from models.dev API
OpenAI models: Retrieved from models.dev API
Unknown models: Default to 32,000 tokens (conservative)

The system updates context limits dynamically based on the latest model information.

Performance Considerations

Compaction Overhead

Compaction requires an additional API call to summarize context, adding 1-3 seconds to processing time

Token Usage

Compaction itself uses tokens (up to 2000 for the summary), but saves significantly more in long conversations

Context Quality

Summaries preserve key information but some nuance may be lost. Recent messages remain fully intact

Best Practices

1. Set Appropriate Thresholds

# Conservative (compact early, better context preservation)
COMPACTION_THRESHOLD=0.6

# Aggressive (compact late, more context before compaction)
COMPACTION_THRESHOLD=0.9

# Balanced (recommended default)
COMPACTION_THRESHOLD=0.7

2. Adjust Recent Messages

# Keep more recent messages (better continuity, more tokens used)
COMPACTION_KEEP_RECENT=5

# Keep fewer recent messages (more aggressive compaction)
COMPACTION_KEEP_RECENT=2

3. Disable if Not Needed

# Disable for short conversations
CONTEXT_COMPACTION=false

Technical Details

Token Estimation

Uses character-based estimation: ~4 characters per token
Updates with actual token usage from AI models
Tracks both input and output tokens

Model Integration

Context limits are fetched from models.dev API
Falls back to conservative 32,000 token limit for unknown models
Caches model information for 24 hours

Error Handling

If compaction fails, creates a fallback summary
Continues execution even if compaction encounters errors
Logs compaction failures for debugging

Troubleshooting

Context Lost After Compaction

Increase COMPACTION_KEEP_RECENT for more preserved messages
Lower COMPACTION_THRESHOLD to compact earlier with more context
The same model is used for summarization, so quality should be consistent

Hitting Token Limits

Lower COMPACTION_THRESHOLD to compact earlier
Reduce COMPACTION_KEEP_RECENT for more aggressive compaction
Consider the model’s actual context limit vs your usage

Compaction Not Working

Check that CONTEXT_COMPACTION is not set to ‘false’
Verify the model supports the context limits being used
Check logs for compaction errors

Example: Long-Running Agent

---
name: data-analyzer
model: anthropic:claude-sonnet-4-0
---

You analyze large datasets over extended periods.
Process data systematically and maintain detailed progress tracking.

# Configure for long-running analysis
export MAX_STEPS=2000
export COMPACTION_THRESHOLD=0.8
export COMPACTION_KEEP_RECENT=5

# Run the agent
agentuse run data-analyzer.agentuse

The agent will automatically:

Track token usage throughout the analysis
Compact context when approaching 80% of the model’s limit
Preserve the last 5 messages for continuity
Continue analysis without interruption

Next Steps

Creating Agents

Build efficient agents

Environment Variables

Configuration options

Getting Started

Guides

Reference

Context Management

Overview

Automatic Compaction

How It Works

Token Tracking

Compaction Strategy

Configuration

Environment Variables

What Gets Preserved

Always Preserved

What Gets Compacted

Compaction Details

Compaction Process

Compaction Prompt

Monitoring

Token Usage

Compaction Events

Context Limits by Model

Performance Considerations

Best Practices

1. Set Appropriate Thresholds

2. Adjust Recent Messages

3. Disable if Not Needed

Technical Details

Token Estimation

Model Integration

Error Handling

Troubleshooting

Example: Long-Running Agent

Next Steps

Creating Agents

Environment Variables

Getting Started

Guides

Reference

​Overview

​Automatic Compaction

​How It Works

​Token Tracking

​Compaction Strategy

​Configuration

​Environment Variables

​What Gets Preserved

​Always Preserved

​What Gets Compacted

​Compaction Details

​Compaction Process

​Compaction Prompt

​Monitoring

​Token Usage

​Compaction Events

​Context Limits by Model

​Performance Considerations

​Best Practices

​1. Set Appropriate Thresholds

​2. Adjust Recent Messages

​3. Disable if Not Needed

​Technical Details

​Token Estimation

​Model Integration

​Error Handling

​Troubleshooting

​Example: Long-Running Agent

​Next Steps

Creating Agents

Environment Variables

Overview

Automatic Compaction

How It Works

Token Tracking

Compaction Strategy

Configuration

Environment Variables

What Gets Preserved

Always Preserved

What Gets Compacted

Compaction Details

Compaction Process

Compaction Prompt

Monitoring

Token Usage

Compaction Events

Context Limits by Model

Performance Considerations

Best Practices

1. Set Appropriate Thresholds

2. Adjust Recent Messages

3. Disable if Not Needed

Technical Details

Token Estimation

Model Integration

Error Handling

Troubleshooting

Example: Long-Running Agent

Next Steps