AI

5 items

Blog posts

Everyone Is Benchmarking MCP Servers Wrong

Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.

AI MCP Research Evaluation

Why I Built mcpbr

MCP developers are shipping tools without evidence they work. I built mcpbr to find out. Here are results from a 500-task controlled SWE-bench experiment that surprised us.

AI MCP Open Source Developer Tools Research Evaluation

Projects

musegpt Archived

Local LLMs in your DAW. A VST3 plugin for AI-powered music production.

89 C++
AI C++

mcpbr

Benchmark runner for Model Context Protocol servers.

20 Python
Python AI MCP Evaluation Developer Tools

claude-chef

Claude Code plugin designed to make Claude a culinary expert.

5 TypeScript
AI MCP TypeScript

All tags

AI (5) Cloud Computing (2) C++ (1) Developer Tools (2) Evaluation (3) MCP (5) Open Source (1) Python (2) Research (3) TypeScript (2)