Research

3 items

Blog posts

Everyone Is Benchmarking MCP Servers Wrong

Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.

AI MCP Research Evaluation

Why I Built mcpbr

MCP developers are shipping tools without evidence they work. I built mcpbr to find out. Here are results from a 500-task controlled SWE-bench experiment that surprised us.

AI MCP Open Source Developer Tools Research Evaluation

Projects

mcp-serialization-repro

Do MCP tools serialize in Claude Code? Empirical study: readOnlyHint controls parallelism, IPC overhead is ~5ms/call.

3 Python
MCP Python Research

All tags

AI (5) Cloud Computing (2) C++ (1) Developer Tools (2) Evaluation (3) MCP (5) Open Source (1) Python (2) Research (3) TypeScript (2)