Grey Newell - Computer Science Researcher

Grey Newell

Computer Science Researcher
Researching how LLM-based agents use tools, with a focus on evaluation methodology and benchmark design. Building open-source infrastructure for measuring AI agent performance.
Contact About

Research

mcpbr: Benchmarking Model Context Protocol Servers on Software Engineering Tasks

Grey Newell. Preprint, 2026. Georgia Institute of Technology.

PDF Code Website DOI: 10.5281/zenodo.18627369

Latest from the blog

Everyone Is Benchmarking MCP Servers Wrong

Existing MCP benchmarks rank models, not servers. Here's how to A/B test whether your MCP server actually improves agent performance.

Why I Built mcpbr

MCP developers are shipping tools without evidence they work. I built mcpbr to find out. Here are results from a 500-task controlled SWE-bench experiment that surprised us.

Implement event-driven invoice processing for resilient financial monitoring at scale

Deep-dive on designing serverless event-driven systems to process 86 million daily invoice events with near real-time visibility. Covers cellular architecture patterns, EventBridge routing strategies, and resilient monitoring at scale.

View all posts →

Projects

Benchmark runner for Model Context Protocol servers.

20 Python Docs

musegpt Archived

Local LLMs in your DAW. A VST3 plugin for AI-powered music production.

89 C++ Website

Serverless event-driven architecture enabling engineering teams to process millions of daily events with near real-time visibility and strong resilience.

6 TypeScript

Frequently asked questions

Model Context Protocol

What is mcpbr and why did Grey Newell create it?
What is the Model Context Protocol (MCP) and how does mcpbr help evaluate MCP servers?
How do I get started with mcpbr to test my MCP server?
Why should MCP server developers benchmark their tools before shipping?
What metrics matter most when evaluating an MCP server's performance?
How is mcpbr different from existing coding benchmarks like SWE-bench?

Technical Publications & Projects

What technical articles has Grey Newell published on the AWS blog?