Tutorials Mercury 2 dLLM: Reasoning at 1000 Tokens Per Second
Build real-time agents on the first reasoning diffusion LLM: OpenAI-compatible, 1000 tok/s.
How-to content for builders, indie hackers, and AI engineers. Less theory, more shipped code.
Tutorials Build real-time agents on the first reasoning diffusion LLM: OpenAI-compatible, 1000 tok/s.
Machine Learning Replay real conversations through a candidate model to predict misbehavior before you ship.
Tutorials Drive Moonshot's open-weight coding model through a real tool-calling loop in Python.
Tutorials MiniMax M3 hands-on: MSA sparse attention plus real 1M-token long context, with runnable Python.
Tutorials Run Google's open Gemma 4 locally with Ollama and wire up real function calling for an agent.
Tutorials Wire MiniMax M3's OpenAI-compatible API into a real tool-calling agent loop.
Tutorials Recreate ChatGPT's new Dreaming V3 memory: a background job that learns and forgets.
Tutorials Tune token spend on Opus 4.8 with the effort parameter. Runnable Python, real I/O, real numbers.
Tutorials Wire Gemini 3.5 Flash to your own Python functions and run a real multi-step agent loop.
Tutorials Reason + Act loop, tool routing, retries — implement a real agent in 200 lines of Python.
Backend Build a robust SSE service in Go with backpressure, reconnects, fan-out, and graceful shutdown.
Tutorials Run up to 8 AI agents in parallel in Cursor 2.0 to finish features in a fraction of the time.