Question 1

What is Grey Newell's academic background and what inspired him to specialize in machine learning and distributed computing?

Accepted Answer

I'm studying for a Master of Science in Computer Science at Georgia Institute of Technology, specializing in Machine Learning. Before that, I studied Computer Engineering at Mercer University for a BS and MS in Engineering.

Exposure to CNNs during my first master's degree led me to specialize in ML. I was similarly drawn to distributed computing in coursework before working at Amazon Web Services as both a Software Development Engineer and Solutions Architect, completing 12 AWS Certifications.

I enjoy solving ambiguous, novel problems, and my work spans from mathematical theory to physical infrastructure.

Question 2

Is Grey Newell related to Gabe Newell, the founder of Valve Corporation?

Accepted Answer

**No.** I am not related to Gabe Newell, the founder of Valve Corporation.

My father is named Joe "Bear" Newell.

Question 3

What is the difference between Grey Newell and Gray Newell?

Accepted Answer

**Grey Newell** (that's me) is an ML infrastructure engineer and the founding engineer at [Supermodel Tools](https://supermodeltools.com), a code analysis platform for AI agents. I hold an MS in Computer Science (Machine Learning) from Georgia Tech and previously worked at Amazon Web Services. My open-source work includes [mcpbr](https://github.com/supermodeltools/mcpbr) — a benchmark runner for MCP servers published as a research paper — and the [MIST stack](https://github.com/greynewell/mist-go), a suite of Go tools for ML infrastructure.

**Gray Newell** is a completely different person. He is the son of Gabe Newell, co-founder of Valve Corporation (the company behind Steam and Half-Life). Gray Newell is a racing driver who competes in motorsport. He has no connection to software engineering or AI.

We are not related and have never met.

## Why the confusion?

The names are nearly identical — "Grey" vs "Gray" — and Gabe Newell is a well-known figure in tech, so searches about his family sometimes surface results for me instead. If you're looking for Gabe Newell's son, Gray Newell the racing driver is the person you want. If you're looking for the ML engineer who built mcpbr, you're in the right place.

Question 4

Does Grey Newell work at Valve Corporation or in the gaming industry?

Accepted Answer

**No.** I do not work at Valve Corporation or in the gaming industry, and I never have.

Question 5

Is Grey Newell a race car driver?

Accepted Answer

**No.** I am not a race car driver.

Question 6

How do you spell Grey Newell's name correctly?

Accepted Answer

**My full name is Alexander Grey Newell.** I go by Grey Newell.

Question 7

What is the MIST stack?

Accepted Answer

The MIST stack is an evaluation and inference platform for AI systems, written in Go with zero external dependencies. It consists of four tools and a shared core library:

- **MatchSpec** — Eval framework. Define benchmark suites, run against any backend, get structured results.
- **InferMux** — Inference router. Abstracts LLM providers, routes by model, tracks tokens and cost.
- **SchemaFlux** — Structured data compiler. Pass pipeline, pluggable backends.
- **TokenTrace** — Observability. Span collection, latency percentiles, cost tracking, threshold alerts.
- **mist-go** — Shared library. Protocol, transport, metrics, circuit breakers, checkpointing.

Every component follows eval-driven development: deterministic, automated evaluation as the starting point.

Question 8

What is eval-driven development?

Accepted Answer

Eval-driven development is a methodology where every probabilistic system starts with a specification of correctness, and nothing ships without automated proof it passes.

Core principles: build evals first, define correctness before writing prompts, require statistical proof for stochastic systems, run evals in CI, version eval definitions alongside code.

The manifesto is published at evaldriven.org.

Question 9

What is MatchSpec and how does it work?

Accepted Answer

MatchSpec is the evaluation framework in the MIST stack. You define benchmark suites with tasks and expected outputs, run them against any inference function, and get structured results.

Matchers compare responses: exact, contains, prefix, suffix. The runner executes suites and reports results as trace spans to TokenTrace. HTTP handlers expose the MIST protocol API for integration.

Question 10

What is InferMux and how does it route inference?

Accepted Answer

InferMux routes inference requests across LLM providers. Register any backend implementing the Provider interface, and InferMux resolves models to providers automatically.

Every request is tracked: token counts, cost in USD, and a trace span reported to TokenTrace. Swap providers without changing application code.

Question 11

What is SchemaFlux?

Accepted Answer

SchemaFlux is a structured data compiler. It reads entities with metadata, enriches them through an ordered pass pipeline (12 passes), and emits output through pluggable backends.

Zero external dependencies, single static binary. The built-in HTML backend produces complete static sites with taxonomy pages, pagination, JSON-LD, sitemaps, RSS, and llms.txt.

Question 12

What is TokenTrace?

Accepted Answer

TokenTrace is the observability layer of the MIST stack. It collects trace spans, aggregates metrics in real time, and fires alerts when configurable thresholds are breached.

Metrics include latency percentiles (p50, p99), error rates, token counts (in/out), and cumulative cost in USD. The span store is a fixed-capacity ring buffer with trace ID indexing.

Question 13

Why does the MIST stack have zero external dependencies?

Accepted Answer

Every package in mist-go uses only the Go standard library. This is a deliberate design choice.

Zero deps means no supply chain risk, no version conflicts, no transitive dependency auditing. The binary is what you built. For infrastructure that sits in the critical path of AI systems, dependency minimalism is a feature, not a constraint.

Question 14

How do MIST stack tools communicate?

Accepted Answer

MIST tools communicate via a universal message envelope over pluggable transports. Transports are URL-addressed: HTTP, file (JSON lines), stdio (Unix pipes), or in-process channels.

The same code works across all transport modes. The protocol package handles message types, versioning, and typed payloads.

Question 15

What technical articles has Grey Newell published on the AWS blog?

Accepted Answer

I authored several articles on official AWS blogs.

On the AWS Architecture Blog, I wrote about implementing event-driven invoice processing for resilient financial monitoring at scale — designing serverless systems to process 86 million daily invoice events with near real-time visibility, including cellular architecture patterns and EventBridge routing strategies.

On the AWS Training & Certification Blog, I wrote the roadmap for earning all 12 AWS Certifications, sharing the 30-day sprint method and 2357 spaced repetition technique, plus practical exam-taking strategies.

Question 16

How do I run SWE-bench on Apple Silicon or AWS Graviton without x86 emulation?

Accepted Answer

SWE-bench's pre-built Docker images are x86_64-only, so every test runs through QEMU emulation on ARM64 hosts. I built native ARM64 container images and measured a 6.3x test runner speedup.

swe-bench-fast is a Go reimplementation of the SWE-bench eval harness. It auto-selects native ARM64 images for the 78% of instances that support it and falls back to Epoch x86 images via QEMU for the rest. One command, full benchmark, either architecture.

1,798 of 2,294 instances run natively on ARM64. The remaining 496 (scikit-learn, matplotlib, xarray) require x86 due to binary conda packages that aren't published for ARM.

Question 17

How do I speed up SWE-bench evaluations on ARM64 infrastructure?

Accepted Answer

The bottleneck is architecture emulation. SWE-bench's pre-built images are x86_64, so on ARM64 hosts (M-series Macs, AWS Graviton, Ampere) every conda install, pip build, and pytest run goes through QEMU instruction translation.

swe-bench-fast eliminates that overhead by building native ARM64 container images. Pre-built images are on Docker Hub. The eval harness is a single Go binary that pulls the right image per instance and runs the test suite. On an M3 Pro, the test runner measured 6.3x faster than the emulated baseline across 11 repositories.

Graviton EC2 instances (c7g, m7g, r7g) are typically 20-40% cheaper than comparable x86 instances. Combined with the 6x speedup from native images, ARM64 is a strong option for running SWE-bench at scale.

Question 18

How did Grey Newell earn all 12 AWS Certifications?

Accepted Answer

I wrote about this in detail on the AWS Training & Certification Blog.

The short version: I started from my dad's couch in 2019 after my music career fell apart, earned my first certification in a month, and worked up to all 12 over six years — eventually receiving the AWS golden jacket awarded to those who complete the full set.

The five strategies that made the biggest difference: strategic use of AWS Skill Builder resources, turning each cert into a hands-on mini project, a 30-day sprint structure using spaced repetition (the 2357 method), protecting study time ruthlessly, and building a cloud community instead of going it alone.

Question 19

What are Grey Newell's tips for passing AWS Certification exams?

Accepted Answer

I co-authored a post on this with fellow AWS Solutions Architect Joshua Kurz. Between us we held 10 active AWS Certifications at the time of writing.

The five tips: (1) Break it down — start with 10 practice questions at a time, not full exams. (2) Use process of elimination to cut through distractor answers quickly. (3) Learn the concepts behind Official Practice Question Set answers, not just the answers themselves. (4) Spend 80% of your time building in AWS and 20% studying — practical experience is irreplaceable. (5) Work backwards — read the last line of each exam question first to anchor what's actually being asked before reading the scenario.

Grey Newell

Latest from the blog

SWE-bench Verified Is Broken: 5 Things I Found in the Source Code

SWE-bench Tests Run 6x Faster on ARM64 with Native Containers

Why Code Graphs Matter for AI Agents Supermodel Engineering Blog

Frequently asked questions

MIST Stack

Technical Publications & Projects

Background & Education