Skip to content
daily-hour-news·

🔬Agents' Last Exam: Top AI Agents Pass Just 2.6%

TL;DR

A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

Agents' Last Exam: Top AI Agents Pass Just 2.6% — daily-hour-news

Key Points

1

1K+ tasks across 55 subfields in 13 industry clusters, mapped to the O*NET/SOC taxonomy

2

Built with input from 250+ industry experts on verifiable, real-world outcomes

3

Average full pass rate on the hardest tier is 2.6% across mainstream harness and backbone configs

4

Designed as a living benchmark that grows as new workflows are onboarded

Why It Matters

A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.

Quick Facts

AI agentsbenchmarkevaluationarXivlong-horizon tasksagentic reasoning

Frequently Asked Questions

Why does this matter?

A 2.6% pass rate on economically meaningful work is a useful reality check against agent demos that imply the jobs are already automatable.

What happened?

A new benchmark, Agents' Last Exam, scores AI agents on long-horizon, economically valuable tasks built with 250+ industry experts. Across mainstream setups the average full pass rate on the hardest tier is 2.6%, exposing a wide gap between benchmarks and real work.

Comments

Subscribe to join the conversation...

Be the first to comment

Enjoyed this article?

Get it daily. 7am. Free. Reads in 5 minutes.

Join 1,987 builders reading daily.