Skip to content
daily-hour-news·

🤖Microsoft MAI-Thinking-1: 1T Params, Only 35B Active

TL;DR

Simon Willison digs into Microsoft's new MAI models: MAI-Thinking-1 (1T parameters, 35B active) and MAI-Code-1-Flash (137B, 5B active) for GitHub Copilot. The 'clean, commercially licensed data' claim doesn't fully survive a read of the technical paper.

Simon Willison digs into Microsoft's new MAI models: MAI-Thinking-1 (1T parameters, 35B active) and MAI-Code-1-Flash (137B, 5B active) for GitHub Copilot. The 'clean, commercially licensed data' claim doesn't fully survive a read of the technical paper.

Key Points

1

MAI-Thinking-1 is a 1T-parameter MoE reasoning model with 35B active parameters, available to select early partners

2

MAI-Code-1-Flash (137B total, 5B active) is purpose-built for GitHub Copilot and rolling out to individual users in VS Code

3

Microsoft claims MAI-Thinking-1 is preferred over Claude Sonnet 4.6 in blind human side-by-side evaluations

4

The technical paper shows training on a proprietary crawl of ~1.2 trillion web pages filtered to 794 billion, plus 24.2 billion Common Crawl pages

5

Willison's takeaway: the 'trained without third-party distillation' framing holds, but the licensing story matches every other major LLM

Why It Matters

Microsoft shipping competitive in-house models for Copilot reduces its dependence on OpenAI at the exact layer where it pays the most. Sparse MoE designs with tiny active counts are how that gets cheap.

Quick Facts

MicrosoftMAI-Thinking-1MAI-Code-1-FlashGitHub Copilotmixture-of-expertstraining dataOpenAI

Frequently Asked Questions

Why does this matter?

Microsoft shipping competitive in-house models for Copilot reduces its dependence on OpenAI at the exact layer where it pays the most. Sparse MoE designs with tiny active counts are how that gets cheap.

What happened?

Simon Willison digs into Microsoft's new MAI models: MAI-Thinking-1 (1T parameters, 35B active) and MAI-Code-1-Flash (137B, 5B active) for GitHub Copilot. The 'clean, commercially licensed data' claim doesn't fully survive a read of the technical paper.

Comments

Subscribe to join the conversation...

Be the first to comment

Enjoyed this article?

Get it daily. 7am. Free. Reads in 5 minutes.

Join 1,937 builders reading daily.