🔬Paper: Building Support AI Agents at 100M-User Scale
TL;DR
Nubank researchers detail an evaluation-driven framework for customer-support AI agents serving 100M+ users. The paper bridges offline development and online impact, showing how to test agents before they reach production.
Nubank researchers detail an evaluation-driven framework for customer-support AI agents serving 100M+ users. The paper bridges offline development and online impact, showing how to test agents before they reach production.

Key Points
Case study at Nubank, serving 100M+ users
Evaluation-driven framework links offline development to online metrics
Focus on production reliability of LLM support agents
Published June 2026 (arXiv:2606.08867)
Why It Matters
Most agent papers stop at benchmarks; this one shows the eval scaffolding needed to ship support agents to tens of millions without breaking trust.
Quick Facts
Frequently Asked Questions
Why does this matter?
Most agent papers stop at benchmarks; this one shows the eval scaffolding needed to ship support agents to tens of millions without breaking trust.
What happened?
Nubank researchers detail an evaluation-driven framework for customer-support AI agents serving 100M+ users. The paper bridges offline development and online impact, showing how to test agents before they reach production.
Comments
Be the first to comment
Enjoyed this article?
Get it daily. 7am. Free. Reads in 5 minutes.
Join 2,015 builders reading daily.