🔬Agents' Last Exam: Top AI Agents Pass Just 2.6%
TL;DR
A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.
A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

Key Points
1K+ tasks across 55 subfields in 13 industry clusters, mapped to the O*NET/SOC taxonomy
Built with input from 250+ industry experts on verifiable, real-world outcomes
Average full pass rate on the hardest tier is 2.6% across mainstream harness and backbone configs
Designed as a living benchmark that grows as new workflows are onboarded
Why It Matters
A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.
Quick Facts
Frequently Asked Questions
Why does this matter?
A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.
What happened?
A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.
Comments
Be the first to comment
Enjoyed this article?
Get it daily. 7am. Free. Reads in 5 minutes.
Join 1,987 builders reading daily.