AI - Grey Newell

SWE-bench Verified Is Broken: 5 Things I Found in the Source Code

March 6, 2026

After building 1,798 SWE-bench containers, I dug into the source. The tests reject correct solutions and every frontier model has memorized the answers.

AI Open Source Evaluation

SWE-bench Tests Run 6x Faster on ARM64 with Native Containers

March 5, 2026

SWE-bench's pre-built x86 containers run through QEMU emulation on ARM64 hosts like Apple Silicon and AWS Graviton. I built native ARM64 images and measured a 6.3x speedup on the test runner.

AI Open Source Evaluation Go

Why Code Graphs Matter for AI Agents

March 2, 2026

AI coding agents lose critical structural understanding of codebases when context compaction occurs. Code graphs provide persistent external memory—representing functions, classes, and dependencies as queryable relationships—so agents can recover context without re-reading files from scratch.

AI Developer Tools Code-graphs

Building Uncompact: Lessons from Production

February 28, 2026

How Supermodel built Uncompact—a tool that maintains a persistent code graph across Claude Code's context compaction events—and the key lessons learned shipping it to production: simplicity over detail, invisibility enables adoption, and layered verification over blind trust.

AI Developer Tools Code-graphs

The Architecture of Supermodel's Code Graph API

February 25, 2026

A look inside Supermodel's real-time code analysis API: the five-stage processing pipeline, multi-language abstraction via a unified node schema, incremental graph updates, and the sub-100ms response time requirement that shaped every design decision.

AI Developer Tools Code-graphs Architecture

Projects

infermux

Route inference across LLM providers. Track cost per request.

89 • Go

Go AI Inference Mist-stack Observability

matchspec

Eval framework. Define correct, test against it, get results.

22 • Go

Go AI Evaluation Mist-stack

evaldriven.org

Ship evals before you ship features.

18 • Markdown

AI Evaluation Methodology

typescript-sdk supermodeltools

TypeScript SDK for Supermodel. Generate useful graphs of your codebase.

6 • TypeScript

TypeScript AI

mcpbr supermodeltools

Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench.

6 • Python

Python AI Evaluation MCP Methodology

openapi-spec supermodeltools

OpenAPI spec for the Supermodel public API. Use as reference or generate your own clients.

5 • YAML

mcp supermodeltools

Supermodel MCP server. Generate code graphs in Cursor, Codex, or Claude Code.

5 • TypeScript

TypeScript AI MCP

arch-docs supermodeltools

GitHub Action to generate architecture documentation for any repository using Supermodel.

5 • JavaScript

dead-code-hunter supermodeltools

GitHub Action to find unreachable functions using Supermodel call graphs.

4 • TypeScript