Guides

AI Guides for Builders

How-to content for builders, indie hackers, and AI engineers. Less theory, more shipped code.

Build an LLM Spend Governor: Budget Caps in Python

Intermediate

Build an LLM Spend Governor: Budget Caps in Python

A runnable Python governor that caps LLM spend per user and auto-downgrades models.

10 min read·Kodetra Technologies

Today

Stream Gemini Thinking: Build a Show-Your-Work CLI

Intermediate

Stream Gemini Thinking: Build a Show-Your-Work CLI

Stream Gemini's thought summaries live, control reasoning effort, and track thinking-token cost.

8 min read·Kodetra Technologies

3d ago

Code Mode for MCP: Let Claude Write Code to Call Tools

Intermediate

Code Mode for MCP: Let Claude Write Code to Call Tools

Cut MCP agent context up to 99% by exposing tools as a code API the model calls in code.

9 min read·Kodetra Technologies

4d ago

Mercury 2 dLLM: Reasoning at 1000 Tokens Per Second

Intermediate

Mercury 2 dLLM: Reasoning at 1000 Tokens Per Second

Build real-time agents on the first reasoning diffusion LLM: OpenAI-compatible, 1000 tok/s.

8 min read·Kodetra Technologies

11d ago

Loop Engineering: From Prompts to Verified Agent Loops

Intermediate

Loop Engineering: From Prompts to Verified Agent Loops

Build a plan-act-verify agent loop with an external check, retry budget, and clear stop rules.

9 min read·Kodetra Technologies

12d ago

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Intermediate

Fable 5 Prompt Caching: Slash 1M-Token Codebase Costs

Reuse a huge codebase prefix across every Fable 5 call and pay ~90% less.

8 min read·Kodetra Technologies

13d ago

Claude Programmatic Tool Calling: Cut Agent Token Costs

Intermediate

Claude Programmatic Tool Calling: Cut Agent Token Costs

Let Claude write code that calls your tools in a loop — 20–40% fewer tokens, same accuracy.

10 min read·Kodetra Technologies

13d ago

Fable 5 Effort: Cut Thinking Token Costs in Python

Intermediate

Fable 5 Effort: Cut Thinking Token Costs in Python

Claude Fable 5 always thinks. Use effort, display and max_tokens to control reasoning cost.

9 min read·Kodetra Technologies

14d ago

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Intermediate

DeepSeek V4 Pro: Cheap 1M-Token Context in Python

Use DeepSeek V4 Pro's auto KV cache to run huge-context jobs for cents.

8 min read·Kodetra Technologies

17d ago

MiniMax M3: Master 1M-Token Long Context With MSA

Intermediate

MiniMax M3: Master 1M-Token Long Context With MSA

MiniMax M3 hands-on: MSA sparse attention plus real 1M-token long context, with runnable Python.

10 min read·Kodetra Technologies

17d ago

Claude Opus 4.8 Effort Levels: A Hands-On Python Guide

Intermediate

Claude Opus 4.8 Effort Levels: A Hands-On Python Guide

Tune token spend on Opus 4.8 with the effort parameter. Runnable Python, real I/O, real numbers.

7 min read·Kodetra Technologies

28d ago

How to Secure an MCP Server Against Tool Poisoning

Advanced

How to Secure an MCP Server Against Tool Poisoning

Harden MCP servers: kill tool poisoning, validate tokens, sandbox tools

9 min read·Kodetra Technologies

May 20

Page 1 of 2