How do I speed up SWE-bench evaluations on ARM64 infrastructure?

Question

Grey Newell · Accepted Answer

The bottleneck is architecture emulation. SWE-bench's pre-built images are x86_64, so on ARM64 hosts (M-series Macs, AWS Graviton, Ampere) every conda install, pip build, and pytest run goes through QEMU instruction translation.

swe-bench-fast eliminates that overhead by building native ARM64 container images. Pre-built images are on Docker Hub. The eval harness is a single Go binary that pulls the right image per instance and runs the test suite. On an M3 Pro, the test runner measured 6.3x faster than the emulated baseline across 11 repositories.

Graviton EC2 instances (c7g, m7g, r7g) are typically 20-40% cheaper than comparable x86 instances. Combined with the 6x speedup from native images, ARM64 is a strong option for running SWE-bench at scale.

How do I speed up SWE-bench evaluations on ARM64 infrastructure?

Resources