Overview
AgentUse automatically manages conversation context to keep your agents running efficiently within token limits. This ensures long-running agents don’t hit token limits while preserving important information.Automatic Compaction
When conversations approach token limits, AgentUse automatically:- Detects when context usage exceeds the threshold
- Compacts older messages into a concise summary
- Preserves recent messages for continuity
- Continues the conversation seamlessly
Compaction happens transparently - your agent continues working without interruption.
How It Works
Token Tracking
AgentUse tracks token usage throughout the conversation and compacts when approaching limits. The system uses character-based estimation (approximately 4 characters per token) and updates with actual usage data from the AI models.Compaction Strategy
AgentUse uses a summarization-based approach:- Older messages are summarized using the same model as the agent
- Recent messages are preserved intact for continuity
- The summary captures key decisions, tool results, and progress
- System prompts and tool definitions are always preserved
Configuration
Environment Variables
Control context management globally using environment variables:What Gets Preserved
Always Preserved
- System prompts
- Recent messages (last 3 by default, configurable)
- Tool definitions remain available
What Gets Compacted
- Older conversation history
- Previous tool calls and results
- Assistant responses from earlier in the conversation
- Key decisions and outcomes
- Important tool results and errors
- Current state and progress
- Critical information needed for continuation
Compaction Details
Compaction Process
When compaction is triggered:- Older messages are separated from recent ones
- A summarization request is sent to the same model
- The summary is created with a specialized system prompt
- Recent messages are kept intact for continuity
- The conversation continues with the compacted context
Compaction Prompt
The system uses this prompt for summarization:Monitoring
Token Usage
AgentUse displays token usage at the end of each run:Compaction Events
When compaction occurs, you’ll see:Context Limits by Model
AgentUse automatically detects context limits for different models:- Anthropic models: Retrieved from models.dev API
- OpenAI models: Retrieved from models.dev API
- Unknown models: Default to 32,000 tokens (conservative)
Performance Considerations
Compaction Overhead
Compaction Overhead
Compaction requires an additional API call to summarize context, adding 1-3 seconds to processing time
Token Usage
Token Usage
Compaction itself uses tokens (up to 2000 for the summary), but saves significantly more in long conversations
Context Quality
Context Quality
Summaries preserve key information but some nuance may be lost. Recent messages remain fully intact
Best Practices
1. Set Appropriate Thresholds
2. Adjust Recent Messages
3. Disable if Not Needed
Technical Details
Token Estimation
- Uses character-based estimation: ~4 characters per token
- Updates with actual token usage from AI models
- Tracks both input and output tokens
Model Integration
- Context limits are fetched from models.dev API
- Falls back to conservative 32,000 token limit for unknown models
- Caches model information for 24 hours
Error Handling
- If compaction fails, creates a fallback summary
- Continues execution even if compaction encounters errors
- Logs compaction failures for debugging
Troubleshooting
Context Lost After Compaction
Context Lost After Compaction
- Increase
COMPACTION_KEEP_RECENT
for more preserved messages - Lower
COMPACTION_THRESHOLD
to compact earlier with more context - The same model is used for summarization, so quality should be consistent
Hitting Token Limits
Hitting Token Limits
- Lower
COMPACTION_THRESHOLD
to compact earlier - Reduce
COMPACTION_KEEP_RECENT
for more aggressive compaction - Consider the model’s actual context limit vs your usage
Compaction Not Working
Compaction Not Working
- Check that
CONTEXT_COMPACTION
is not set to ‘false’ - Verify the model supports the context limits being used
- Check logs for compaction errors
Example: Long-Running Agent
- Track token usage throughout the analysis
- Compact context when approaching 80% of the model’s limit
- Preserve the last 5 messages for continuity
- Continue analysis without interruption