Tutorials GLM-5.2 Open Weights: Route Reasoning Effort by Task
Build a cost-aware GLM-5.2 agent that routes thinking effort per task and calls tools.
How-to content for builders, indie hackers, and AI engineers. Less theory, more shipped code.
Tutorials Build a cost-aware GLM-5.2 agent that routes thinking effort per task and calls tools.
Tutorials Build real-time agents on the first reasoning diffusion LLM: OpenAI-compatible, 1000 tok/s.
Machine Learning Replay real conversations through a candidate model to predict misbehavior before you ship.
Tutorials Reuse a huge codebase prefix across every Fable 5 call and pay ~90% less.
Tutorials Drive Moonshot's open-weight coding model through a real tool-calling loop in Python.
Machine Learning Run Google's open diffusion LLM with Transformers and learn why it decodes text in parallel.
New AI guides for builders, in your inbox. Free.
Join 2,110 builders reading daily.
Tutorials Claude Fable 5 always thinks. Use effort, display and max_tokens to control reasoning cost.
Tutorials Use DeepSeek V4 Pro's auto KV cache to run huge-context jobs for cents.
Tutorials MiniMax M3 hands-on: MSA sparse attention plus real 1M-token long context, with runnable Python.
Tutorials Catch Claude Fable 5's stop_reason refusal and auto-retry on Opus 4.8 without breaking production.
Tutorials Run Google's open Gemma 4 locally with Ollama and wire up real function calling for an agent.
Tutorials Use speed:"fast" on Claude Opus 4.8 for up to 2.5x faster output, with a safe rate-limit fallback.