Grey Newell - ML Infrastructure Engineer

Grey Newell

ML Infrastructure Engineer
Building evaluation, inference, and observability systems for AI. Creator of the MIST stack. Founding Engineer at Supermodel. MS CS (ML) at Georgia Tech. Ex-AWS.
Contact About

Latest from the blog

SWE-bench Verified Is Broken: 5 Things I Found in the Source Code

After building 1,798 SWE-bench containers, I dug into the source. The tests reject correct solutions and every frontier model has memorized the answers.

SWE-bench Tests Run 6x Faster on ARM64 with Native Containers

SWE-bench's pre-built x86 containers run through QEMU emulation on ARM64 hosts like Apple Silicon and AWS Graviton. I built native ARM64 images and measured a 6.3x speedup on the test runner.

Implement Event-Driven Invoice Processing for Resilient Financial Monitoring at Scale AWS Architecture Blog

How to build a Business Event Monitoring System (BEMS) on AWS that handles over 86 million daily events with near real-time visibility, cross-Region controls, and automated alerts for stuck events.

View all posts →

Projects

Eval framework. Define correct, test against it, get results.

22 Go Website

Route inference across LLM providers. Track cost per request.

89 Go Website

Structured data compiler. Pass pipeline, pluggable backends.

12 Go Website

Where did your tokens go? Spans, latency percentiles, alerts.

5 Go Website

Shared core for the MIST stack. Zero external deps.

1 Go

Ship evals before you ship features.

18 Markdown Website

Frequently asked questions

MIST Stack

What is the MIST stack?
What is eval-driven development?
What is MatchSpec and how does it work?
What is InferMux and how does it route inference?
What is SchemaFlux?
What is TokenTrace?
Why does the MIST stack have zero external dependencies?
How do MIST stack tools communicate?

Technical Publications & Projects

What technical articles has Grey Newell published on the AWS blog?
How do I run SWE-bench on Apple Silicon or AWS Graviton without x86 emulation?
How do I speed up SWE-bench evaluations on ARM64 infrastructure?
How did Grey Newell earn all 12 AWS Certifications?
What are Grey Newell's tips for passing AWS Certification exams?