RouteKitAI Architecture¶
Overview¶
RouteKitAI is designed around a simple but powerful philosophy: graph-native orchestration, tracing-first observability, and replay-built-in testing. These aren't add-ons—they're core to RouteKitAI's design from day one.
The Wedge: Three First-Class Features¶
RouteKitAI's MVP focuses on deterministic, testable agent runs through three integrated features:
- Graph-native orchestration: Agents compose into explicit workflows with clear control flow
- Tracing: Every execution produces an immutable event log
- Replay: Deterministic re-execution using trace data and stubs
Why This Matters¶
Most agent frameworks treat orchestration, observability, and testing as separate concerns added later. RouteKitAI inverts this:
- Graph-native: Workflows are graphs, not linear scripts. Control flow is explicit and inspectable.
- Tracing-first: Every run produces a complete trace. No opt-in, no sampling—always on.
- Replay-built-in: Testing isn't an afterthought. Replay uses the same runtime with stubs, ensuring production and test behavior match.
This enables: - Deterministic testing: Replay any run with exact inputs/outputs - Debugging: Inspect full execution history, not just logs - Reproducibility: Re-run failed executions to diagnose issues - Confidence: Test agent behavior before deploying
Execution Model¶
Step-Based Runtime¶
RouteKitAI executes agents in discrete steps. Each step: - Takes a message and context - Produces a message and metadata - Records all inputs/outputs to the trace - Can trigger tool calls, sub-agent calls, or control flow decisions
Steps are the atomic unit of execution and tracing. This granularity enables: - Precise replay matching - Detailed observability - Fine-grained error handling - Parallel execution control
Traces as Immutable Event Logs¶
Every runtime execution produces a trace: an immutable, append-only log of events:
step_started: Step execution beginsstep_completed: Step execution finishesmodel_called: LLM API call madetool_called: Tool execution requesttool_result: Tool execution resulterror: Error occurred during executionrun_started: Agent run beginsrun_completed: Agent run finishes
Traces are: - Immutable: Once written, never modified - Complete: Every event is recorded - Structured: Events are typed and queryable - Portable: Can be saved, loaded, and replayed
Replay as Deterministic Re-Execution¶
Replay uses the same runtime with stubs:
- Load a trace from a previous run
- Replace external calls (LLM, tools, APIs) with stubs that return recorded values
- Re-execute using the trace's event sequence
- Verify outputs match the original trace
This ensures: - Determinism: Same inputs → same outputs - Speed: No real API calls during replay - Reliability: Test against real production traces - Debugging: Step through execution with full context
Core Components¶
1. Model Interface¶
The Model abstract base class provides a unified interface for LLM providers:
class Model(ABC):
async def chat(
self,
messages: list[Message],
tools: list[Tool] | None = None,
stream: bool = False,
) -> ModelResponse | AsyncIterator[StreamEvent]:
...
This abstraction allows RouteKitAI to work with any LLM provider (OpenAI, Anthropic, local models, etc.) without coupling to specific APIs.
2. Message System¶
Messages are the currency of agent communication:
class Message(BaseModel):
role: MessageRole # SYSTEM, USER, ASSISTANT, TOOL
content: str
tool_calls: list[dict[str, Any]] | None
tool_result: dict[str, Any] | None
metadata: dict[str, Any]
Messages flow through the system, carrying context and enabling multi-turn conversations.
3. Tool System¶
Tools are callable functions that agents can invoke:
class Tool(BaseModel, ABC):
name: str
description: str
input_model: type[BaseModel] | None
output_model: type[BaseModel] | None
@abstractmethod
async def run(self, input: BaseModel) -> BaseModel:
...
Tools use Pydantic models for input/output validation, ensuring type safety and clear contracts.
4. Agent¶
An agent combines a model, tools, and optional memory:
class Agent(BaseModel):
name: str
model: Model
tools: list[Tool]
memory: Memory | None
policy: Policy | None
Agents are stateless—all state is managed by the Runtime.
5. Runtime¶
The Runtime orchestrates agent execution:
class Runtime(BaseModel):
agents: dict[str, Agent]
trace_dir: Path | None
policy_hooks: PolicyHooks | None
max_concurrency: int
timeout: float | None
...
The Runtime: - Manages agent registration - Executes steps according to policies - Records traces - Handles errors and timeouts - Supports replay
Graph Orchestration¶
Graph Structure¶
Graphs define workflows as directed graphs:
class Graph(BaseModel):
name: str
nodes: list[GraphNode]
edges: list[GraphEdge]
entry_node: str
exit_node: str | None
state_schema: dict[str, Any] | None
Node Types¶
- MODEL: Execute an agent/model call
- TOOL: Execute a tool
- SUBGRAPH: Execute a nested graph
- CONDITION: Conditional branching based on state
State Management¶
Graphs maintain state that flows between nodes:
- Input mapping: Maps graph state to node inputs
- Output mapping: Maps node outputs to graph state
- State schema: Optional JSON schema for validation
Execution Flow¶
- Start at
entry_node - Execute current node with mapped inputs
- Update graph state with node outputs
- Determine next node(s) based on edges and conditions
- Repeat until
exit_nodeor no more edges
Policy System¶
Policies determine what actions an agent takes:
class Policy(ABC):
async def plan(self, state: dict[str, Any]) -> list[Action]:
...
async def reflect(self, state: dict[str, Any], observation: dict[str, Any]) -> dict[str, Any]:
...
Built-in Policies¶
- ReActPolicy: Reasoning + Acting loop (default)
- FunctionCallingPolicy: Strict function calling mode
- GraphPolicy: Execute a graph workflow
- SupervisorPolicy: Multi-agent coordination
- PlanExecutePolicy: Plan then execute pattern
Action Types¶
- ModelAction: Call the model
- ToolAction: Execute a tool
- Parallel: Execute multiple actions in parallel
- Final: Terminate with output
Memory System¶
RouteKitAI supports multiple memory backends:
- EpisodicMemory: SQLite-backed episode storage
- RetrievalMemory: TF-IDF or substring search
- VectorMemory: Vector similarity search
- KVMemory: Key-value storage
- WorkingMemory: In-memory context for a single run
Memory is accessed through the Memory abstract interface, allowing agents to retain context across runs.
Security & Governance¶
Policy Hooks¶
Policy hooks intercept execution at key points:
- PII Redaction: Automatically redact sensitive data
- Tool Filtering: Allow/deny lists for tools
- Approval Gates: Require approval for high-risk operations
Sandboxing¶
Sandboxing provides isolation for tool execution:
- FilesystemSandbox: Control file system access
- NetworkSandbox: Control network access
- Permissions: Fine-grained permission system
MVP Scope¶
Included¶
- Core primitives: Model, Message, Tool, Agent, Runtime
- Graph orchestration: Define agent workflows as directed graphs
- Step-based execution: Discrete execution steps with full trace capture
- Trace format: Immutable event log structure
- Replay engine: Deterministic re-execution with stubs
- Basic tooling: Save/load traces, stub generation
- Memory backends: Episodic and retrieval memory
- Security hooks: PII redaction, tool filtering, approval gates
- CLI tools: Run, trace, replay, test commands
- Trace analysis: Metrics (
trace-analyze), search (trace-search), timeline/step views, web UI (serve)
Excluded (Post-MVP)¶
- Distributed execution: Multi-machine orchestration
- Streaming traces: Real-time trace streaming/aggregation
- Production observability: Integration with external monitoring (e.g. OpenTelemetry exporters exist; full integration TBD)
- Advanced graph features: Dynamic graphs, conditional branching beyond basics
- Model providers: Built-in integrations (use adapters)
- UI/dashboards: Additional trace visualization (basic serve UI included)
Design Principles¶
- Minimal core: 5 primitives max, no bloat
- Type-safe: Full type hints, mypy-clean
- Async-first: Built for async I/O from the ground up
- Testable: Every feature must support deterministic testing
- Traceable: Every execution produces a complete trace
- Composable: Build complex workflows from simple primitives
- Extensible: Easy to add new models, tools, policies, memory backends
Performance Considerations¶
RouteKitAI prioritizes correctness and testability over raw performance:
- Correctness first: Deterministic behavior is more important than speed
- Testability: Replay enables fast, reliable tests
- Observability: Complete traces enable debugging and optimization
- Async I/O: Non-blocking operations for concurrent execution
Performance optimizations can be added later without breaking the core design.
Future Directions¶
- Distributed execution: Scale across multiple machines
- Advanced graph features: Dynamic graphs, parallel node execution
- Trace analysis: Further tooling and integrations (basic analysis and serve UI included)
- Production observability: Integrate with monitoring systems
- Model provider integrations: Built-in support for major LLM providers
- UI/dashboards: Visual trace inspection and graph editing