Everyone Is Benchmarking MCP Servers Wrong
Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.
5 items
Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.
MCP developers are shipping tools without evidence they work. I built mcpbr to find out. Here are results from a 500-task controlled SWE-bench experiment that surprised us.
Benchmark runner for Model Context Protocol servers.
Claude Code plugin designed to make Claude a culinary expert.
Do MCP tools serialize in Claude Code? Empirical study: readOnlyHint controls parallelism, IPC overhead is ~5ms/call.