Context engineering for AI agents
Agents get worse the longer conversations go. SR2 manages the full lifecycle of your LLM's context window — so your agent stays sharp on turn 30.
Every turn adds tokens. System prompts, tool results, conversation history — it all compounds. By turn 20, your agent is spending 10x what it should and losing coherence as critical context gets pushed out.
Nobody manages this today. SR2 does — automatically, through config, not code.
SR2 compiles your context through three composable layers before every LLM call. Each layer is config-driven and independently tunable.
System prompts, tool definitions, and static context. Optimized for KV-cache prefix reuse across turns.
Persistent facts, summaries, and retrieved knowledge. Survives compaction and carries forward across sessions.
Three-zone message management with active, buffer, and archive zones. Compacts and summarizes automatically.
Config-driven context management that works with any LLM provider.
Active, buffer, and archive zones with configurable boundaries. Messages flow through zones as conversation grows, with automatic compaction at each boundary.
100% prefix hit rate through stable context ordering. Core and memory layers maintain consistent prefixes, so your provider's KV-cache stays warm across turns.
Automatic message compaction preserves key information while reducing token count. LLM-powered summarization captures what matters from archived messages.
When context pressure hits critical, SR2 progressively drops lower-priority content while preserving essential context. Your agent stays functional, never crashes.
Extract and persist facts across conversation turns and sessions. Semantic retrieval surfaces relevant memories when they matter most.
Track tool call lifecycles and manage their context impact. Completed tool results can be compacted while preserving their outcomes.
Built-in observability with token usage, cache hit rates, compaction events, and context pressure gauges. Know exactly what's happening inside your context window.
Define your context strategy in YAML or dict. No subclassing, no framework lock-in. Change behavior by changing config, not code.
Fully open source. Use it in production, fork it, extend it. No vendor lock-in, no usage fees, no strings attached.
Real numbers from 688 tests across 71 files.
Install SR2 and start managing your agent's context window.
from sr2 import ContextEngine # Define your context strategy in config config = { "model": "claude-sonnet-4-20250514", "context_window": 200_000, "conversation": { "active_turns": 10, "buffer_turns": 5, "compaction": "summarize", }, } # Create the engine engine = ContextEngine(config) # Add messages as your agent runs engine.add_message("user", "Analyze this dataset...") # Compile context for the LLM call context = engine.compile() # → Optimized, compacted, cache-friendly messages