Agents' Last Exam: Top AI Agents Pass Just 2.6%

ContentBuffer

daily-hour-news·Jun 11, 2026

🔬Agents' Last Exam: Top AI Agents Pass Just 2.6%

TL;DR

A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

Key Points

1

1K+ tasks across 55 subfields in 13 industry clusters, mapped to the O*NET/SOC taxonomy

2

Built with input from 250+ industry experts on verifiable, real-world outcomes

3

Average full pass rate on the hardest tier is 2.6% across mainstream harness and backbone configs

4

Designed as a living benchmark that grows as new workflows are onboarded

Why It Matters

A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.

Quick Facts

AI agentsbenchmarkevaluationarXivlong-horizon tasksagentic reasoning

Frequently Asked Questions

Why does this matter?

A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.

What happened?

A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

🔬Agents' Last Exam: Top AI Agents Pass Just 2.6%

Key Points

Why It Matters

Quick Facts

Frequently Asked Questions

Comments

Enjoyed this article?