Methodology

2 items

Projects

Ship evals before you ship features.

18 • Markdown

AI Evaluation Methodology

Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench.

6 • Python

Python AI Evaluation MCP Methodology