SR2

The Problem

Context windows fill up. Agents break down.

Every turn adds tokens. System prompts, tool results, conversation history — it all compounds. By turn 20, your agent is spending 10x what it should and losing coherence as critical context gets pushed out.

Nobody manages this today. SR2 does — automatically, through config, not code.

Token usage over 30 turns

Naive

184K tokens

SR2

89K tokens

52% reduction ↓

How It Works

Three-layer pipeline

SR2 compiles your context through three composable layers before every LLM call. Each layer is config-driven and independently tunable.

Layer 1

Core

System prompts, tool definitions, and static context. Optimized for KV-cache prefix reuse across turns.

→

Layer 2

Memory

Persistent facts, summaries, and retrieved knowledge. Survives compaction and carries forward across sessions.

→

Layer 3

Conversation

Three-zone message management with active, buffer, and archive zones. Compacts and summarizes automatically.

Features

Everything your agent's context needs

Config-driven context management that works with any LLM provider.

▨

Three-Zone Conversations

Active, buffer, and archive zones with configurable boundaries. Messages flow through zones as conversation grows, with automatic compaction at each boundary.

⚡

KV-Cache Optimization

100% prefix hit rate through stable context ordering. Core and memory layers maintain consistent prefixes, so your provider's KV-cache stays warm across turns.

⬆

Compaction & Summarization

Automatic message compaction preserves key information while reducing token count. LLM-powered summarization captures what matters from archived messages.

⛔

Graceful Degradation

When context pressure hits critical, SR2 progressively drops lower-priority content while preserving essential context. Your agent stays functional, never crashes.

🧠

Memory System

Extract and persist facts across conversation turns and sessions. Semantic retrieval surfaces relevant memories when they matter most.

⚙

Tool State Machine

Track tool call lifecycles and manage their context impact. Completed tool results can be compacted while preserving their outcomes.

📊

Prometheus Metrics

Built-in observability with token usage, cache hit rates, compaction events, and context pressure gauges. Know exactly what's happening inside your context window.

📝

Config-Driven

Define your context strategy in YAML or dict. No subclassing, no framework lock-in. Change behavior by changing config, not code.

🔒

Apache 2.0

Fully open source. Use it in production, fork it, extend it. No vendor lock-in, no usage fees, no strings attached.

Quick Start

Up and running in minutes

Install SR2 and start managing your agent's context window.

main.py
from sr2 import ContextEngine

# Define your context strategy in config
config = {
    "model": "claude-sonnet-4-20250514",
    "context_window": 200_000,
    "conversation": {
        "active_turns": 10,
        "buffer_turns": 5,
        "compaction": "summarize",
    },
}

# Create the engine
engine = ContextEngine(config)

# Add messages as your agent runs
engine.add_message("user", "Analyze this dataset...")

# Compile context for the LLM call
context = engine.compile()
# → Optimized, compacted, cache-friendly messages