Guides

AI Guides for Builders

How-to content for builders, indie hackers, and AI engineers. Less theory, more shipped code.

GLM-5.2 Open Weights: Route Reasoning Effort by Task

Intermediate

GLM-5.2 Open Weights: Route Reasoning Effort by Task

Build a cost-aware GLM-5.2 agent that routes thinking effort per task and calls tools.

9 min read·Kodetra Technologies

Today

Mercury 2 dLLM: Reasoning at 1000 Tokens Per Second

Intermediate

Mercury 2 dLLM: Reasoning at 1000 Tokens Per Second

Build real-time agents on the first reasoning diffusion LLM: OpenAI-compatible, 1000 tok/s.

8 min read·Kodetra Technologies

Today

Build a Deployment Simulation Eval to Catch Model Drift

Machine Learning

Intermediate

Build a Deployment Simulation Eval to Catch Model Drift

Replay real conversations through a candidate model to predict misbehavior before you ship.

11 min read·Kodetra Technologies

Yesterday

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Intermediate

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Reuse a huge codebase prefix across every Fable 5 call and pay ~90% less.

8 min read·Kodetra Technologies

3d ago

DiffusionGemma in Python: Generate Text 4x Faster

Machine Learning

Intermediate

DiffusionGemma in Python: Generate Text 4x Faster

Run Google's open diffusion LLM with Transformers and learn why it decodes text in parallel.

9 min read·Kodetra Technologies

4d ago

Fable 5 Effort: Cut Thinking Token Costs in Python

Intermediate

Fable 5 Effort: Cut Thinking Token Costs in Python

Claude Fable 5 always thinks. Use effort, display and max_tokens to control reasoning cost.

9 min read·Kodetra Technologies

4d ago

More guides like this?

New AI guides for builders, in your inbox. Free.

Join 2,110 builders reading daily.

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Intermediate

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Use DeepSeek V4 Pro's auto KV cache to run huge-context jobs for cents.

8 min read·Kodetra Technologies

6d ago

MiniMax M3: Master 1M-Token Long Context With MSA

Intermediate

MiniMax M3: Master 1M-Token Long Context With MSA

MiniMax M3 hands-on: MSA sparse attention plus real 1M-token long context, with runnable Python.

10 min read·Kodetra Technologies

7d ago

Handle Fable 5 Refusals With Fallbacks in Python

Intermediate

Handle Fable 5 Refusals With Fallbacks in Python

Catch Claude Fable 5's stop_reason refusal and auto-retry on Opus 4.8 without breaking production.

10 min read·Kodetra Technologies

8d ago

Gemma 4 Tool Calling: Build a Local AI Agent

Intermediate

Gemma 4 Tool Calling: Build a Local AI Agent

Run Google's open Gemma 4 locally with Ollama and wire up real function calling for an agent.

10 min read·Kodetra Technologies

8d ago

Claude Opus 4.8 Fast Mode: 2.5x Faster Output in Python

Intermediate

Claude Opus 4.8 Fast Mode: 2.5x Faster Output in Python

Use speed:"fast" on Claude Opus 4.8 for up to 2.5x faster output, with a safe rate-limit fallback.

9 min read·Kodetra Technologies

9d ago

Build an Apple-Style Multi-Model AI Router in Python

Intermediate

Build an Apple-Style Multi-Model AI Router in Python

WWDC let iPhone users pick ChatGPT, Gemini, or Claude. Build the same model router in Python.

9 min read·Kodetra Technologies

10d ago

Page 1 of 3