Methodology

2 items

Projects

evaldriven.org

Ship evals before you ship features.

18 Markdown
AI Evaluation Methodology

mcpbr supermodeltools

Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench.

6 Python
Python AI Evaluation MCP Methodology

All tags

AI (15) Architecture (2) Aws (4) Cloud Computing (1) Code-graphs (3) Compiler (1) Developer Tools (3) Evaluation (5) Event-driven (1) Go (7) Inference (1) MCP (2) Methodology (2) Mist-stack (5) Observability (3) Open Source (2) Python (2) Serverless (1) TypeScript (4)