Eval framework. Define correct, test against it, get results.
Route inference across LLM providers. Track cost per request.
Structured data compiler. Pass pipeline, pluggable backends.
Where did your tokens go? Spans, latency percentiles, alerts.
Shared core for the MIST stack. Zero external deps.
Ship evals before you ship features.