SWE-bench Verified Is Broken: 5 Things I Found in the Source Code
After building 1,798 SWE-bench containers, I dug into the source. The tests reject correct solutions and every frontier model has memorized the answers.
Thoughts on AI agent evaluation, benchmark methodology, and the tools I build along the way.
After building 1,798 SWE-bench containers, I dug into the source. The tests reject correct solutions and every frontier model has memorized the answers.
SWE-bench's pre-built x86 containers run through QEMU emulation on ARM64 hosts like Apple Silicon and AWS Graviton. I built native ARM64 images and measured a 6.3x speedup on the test runner.
AI coding agents lose critical structural understanding of codebases when context compaction occurs. Code graphs provide persistent external memory—representing functions, classes, and dependencies as queryable relationships—so agents can recover context without re-reading files from scratch.
How Supermodel built Uncompact—a tool that maintains a persistent code graph across Claude Code's context compaction events—and the key lessons learned shipping it to production: simplicity over detail, invisibility enables adoption, and layered verification over blind trust.
A look inside Supermodel's real-time code analysis API: the five-stage processing pipeline, multi-language abstraction via a unified node schema, incremental graph updates, and the sub-100ms response time requirement that shaped every design decision.
How to build a Business Event Monitoring System (BEMS) on AWS that handles over 86 million daily events with near real-time visibility, cross-Region controls, and automated alerts for stuck events.
Learn practical strategies that helped me transform from a struggling new graduate to an AWS Solutions Architect, eventually earning the coveted golden jacket awarded to those who achieve all twelve AWS Certifications.
We're both solutions architects at AWS, and between us, we hold 10 active AWS Certifications. Here are five tips AWS Solutions Architects swear by to prepare for and pass AWS Certification exams.