⚡Cerebras Runs Kimi K2.6 at 981 Tokens/Sec, 6.7x GPUs
TL;DR
Days after its $95B IPO, Cerebras is serving Moonshot's trillion-parameter Kimi K2.6 at 981 output tokens/sec, 6.7x the fastest GPU cloud per Artificial Analysis. A 10K-token agentic request finishes in 5.6 seconds versus 163.7 on Kimi's own endpoint.
Days after its $95B IPO, Cerebras is serving Moonshot's trillion-parameter Kimi K2.6 at 981 output tokens/sec, 6.7x the fastest GPU cloud per Artificial Analysis. A 10K-token agentic request finishes in 5.6 seconds versus 163.7 on Kimi's own endpoint.

Key Points
981 output tokens/sec on Kimi K2.6, independently verified by Artificial Analysis, 6.7x the next-fastest GPU cloud and 23x the median
A 10K-token agentic coding request finished in 5.6s versus 163.7s on Moonshot's official Kimi endpoint
Kimi K2.6 is a trillion-parameter open-weight MoE from Beijing's Moonshot AI, released April 20
Cerebras IPO'd in May 2026 at a $95B valuation, the largest tech IPO of the year
Why It Matters
Wafer-scale inference near 1,000 tokens/sec makes reasoning and agent workloads that chain dozens of calls actually usable, and it is the clearest test yet of Nvidia's grip on inference.
Quick Facts
Frequently Asked Questions
Why does this matter?
Wafer-scale inference near 1,000 tokens/sec makes reasoning and agent workloads that chain dozens of calls actually usable, and it is the clearest test yet of Nvidia's grip on inference.
What happened?
Days after its $95B IPO, Cerebras is serving Moonshot's trillion-parameter Kimi K2.6 at 981 output tokens/sec, 6.7x the fastest GPU cloud per Artificial Analysis. A 10K-token agentic request finishes in 5.6 seconds versus 163.7 on Kimi's own endpoint.
Comments
Be the first to comment
Enjoyed this article?
Get it daily. 7am. Free. Reads in 5 minutes.