mcpbr supermodeltools Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench. 6 • Python Python AI Evaluation MCP Methodology