Cerebras Runs Kimi K2.6 at 981 Tokens/Sec, 6.7x GPUs

ContentBuffer

daily-hour-news·May 26, 2026

⚡Cerebras Runs Kimi K2.6 at 981 Tokens/Sec, 6.7x GPUs

TL;DR

Days after its $95B IPO, Cerebras is serving Moonshot's trillion-parameter Kimi K2.6 at 981 output tokens/sec, 6.7x the fastest GPU cloud per Artificial Analysis. A 10K-token agentic request finishes in 5.6 seconds versus 163.7 on Kimi's own endpoint.

Cerebras Runs Kimi K2.6 at 981 Tokens/Sec, 6.7x GPUs — daily-hour-news

Key Points

1

981 output tokens/sec on Kimi K2.6, independently verified by Artificial Analysis, 6.7x the next-fastest GPU cloud and 23x the median

2

A 10K-token agentic coding request finished in 5.6s versus 163.7s on Moonshot's official Kimi endpoint

3

Kimi K2.6 is a trillion-parameter open-weight MoE from Beijing's Moonshot AI, released April 20

4

Cerebras IPO'd in May 2026 at a $95B valuation, the largest tech IPO of the year

Why It Matters

Wafer-scale inference near 1,000 tokens/sec makes reasoning and agent workloads that chain dozens of calls actually usable, and it is the clearest test yet of Nvidia's grip on inference.

Quick Facts

CerebrasKimi K2.6Moonshot AIAI inferencewafer-scaleNvidiaopen weights

Frequently Asked Questions

Why does this matter?

Wafer-scale inference near 1,000 tokens/sec makes reasoning and agent workloads that chain dozens of calls actually usable, and it is the clearest test yet of Nvidia's grip on inference.

What happened?

Days after its $95B IPO, Cerebras is serving Moonshot's trillion-parameter Kimi K2.6 at 981 output tokens/sec, 6.7x the fastest GPU cloud per Artificial Analysis. A 10K-token agentic request finishes in 5.6 seconds versus 163.7 on Kimi's own endpoint.

⚡Cerebras Runs Kimi K2.6 at 981 Tokens/Sec, 6.7x GPUs

Key Points

Why It Matters

Quick Facts

Frequently Asked Questions

Comments

Enjoyed this article?