Research

3 items

Blog posts

Everyone Is Benchmarking MCP Servers Wrong

February 13, 2026

Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.

AI MCP Research Evaluation

Why I Built mcpbr

February 6, 2026

MCP developers are shipping tools without evidence they work. I built mcpbr to find out. Here are results from a 500-task controlled SWE-bench experiment that surprised us.

AI MCP Open Source Developer Tools Research Evaluation

Projects

mcp-serialization-repro

Do MCP tools serialize in Claude Code? Empirical study: readOnlyHint controls parallelism, IPC overhead is ~5ms/call.

3 • Python

MCP Python Research