Guides

AI Guides for Builders

How-to content for builders, indie hackers, and AI engineers. Less theory, more shipped code.

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Intermediate

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Reuse a huge codebase prefix across every Fable 5 call and pay ~90% less.

8 min read·Kodetra Technologies

Today

DiffusionGemma in Python: Generate Text 4x Faster

Machine Learning

Intermediate

DiffusionGemma in Python: Generate Text 4x Faster

Run Google's open diffusion LLM with Transformers and learn why it decodes text in parallel.

9 min read·Kodetra Technologies

Yesterday

Fable 5 Effort: Cut Thinking Token Costs in Python

Intermediate

Fable 5 Effort: Cut Thinking Token Costs in Python

Claude Fable 5 always thinks. Use effort, display and max_tokens to control reasoning cost.

9 min read·Kodetra Technologies

Yesterday

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Intermediate

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Use DeepSeek V4 Pro's auto KV cache to run huge-context jobs for cents.

8 min read·Kodetra Technologies

3d ago

MiniMax M3: Master 1M-Token Long Context With MSA

Intermediate

MiniMax M3: Master 1M-Token Long Context With MSA

MiniMax M3 hands-on: MSA sparse attention plus real 1M-token long context, with runnable Python.

10 min read·Kodetra Technologies

4d ago

Handle Fable 5 Refusals With Fallbacks in Python

Intermediate

Handle Fable 5 Refusals With Fallbacks in Python

Catch Claude Fable 5's stop_reason refusal and auto-retry on Opus 4.8 without breaking production.

10 min read·Kodetra Technologies

5d ago

More guides like this?

New AI guides for builders, in your inbox. Free.

Join 2,072 builders reading daily.

Gemma 4 Tool Calling: Build a Local AI Agent

Intermediate

Gemma 4 Tool Calling: Build a Local AI Agent

Run Google's open Gemma 4 locally with Ollama and wire up real function calling for an agent.

10 min read·Kodetra Technologies

5d ago

Claude Opus 4.8 Fast Mode: 2.5x Faster Output in Python

Intermediate

Claude Opus 4.8 Fast Mode: 2.5x Faster Output in Python

Use speed:"fast" on Claude Opus 4.8 for up to 2.5x faster output, with a safe rate-limit fallback.

9 min read·Kodetra Technologies

6d ago

Build an Apple-Style Multi-Model AI Router in Python

Intermediate

Build an Apple-Style Multi-Model AI Router in Python

WWDC let iPhone users pick ChatGPT, Gemini, or Claude. Build the same model router in Python.

9 min read·Kodetra Technologies

7d ago

MiniMax M3 Tool Calling: Build an Agentic Loop in Python

Intermediate

MiniMax M3 Tool Calling: Build an Agentic Loop in Python

Wire MiniMax M3's OpenAI-compatible API into a real tool-calling agent loop.

8 min read·Kodetra Technologies

7d ago

Run a Local LLM on Android with llama.cpp + Vulkan

Intermediate

Run a Local LLM on Android with llama.cpp + Vulkan

Compile llama.cpp with Vulkan in Termux and run a quantized LLM on your Android GPU, no root.

9 min read·Kodetra Technologies

10d ago

MAI-Code-1-Flash in Python: 5B Coder Beats Haiku 4.5

Intermediate

MAI-Code-1-Flash in Python: 5B Coder Beats Haiku 4.5

Call Microsoft's June 2 coding model via OpenRouter for cheap, fast refactors.

10 min read·Kodetra Technologies

13d ago

Page 1 of 2