
Claude Fable 5 in Python: Build a Self-Checking Agent
Summary
Build a tool-using agent on Anthropic's Claude Fable 5 that plans, acts, and verifies its own work.
Claude Fable 5 in Python: Build a Self-Checking Agent
On June 9, 2026 Anthropic did something it had spent the prior week warning against: it shipped its most powerful model to the public. Claude Fable 5 is the generally available, safety-hardened sibling of the internal Mythos 5 system. Same raw capability, but wrapped in a guardrail layer that quietly hands risky requests off to Opus 4.8 instead of answering them. Within hours it was live on the Anthropic API, AWS Bedrock, GitHub Copilot, Databricks, and Microsoft Foundry, and the demo making the rounds was a one-click "type a sentence, get a playable video game" toy.
The headline demo is fun, but the reason Fable 5 matters to builders is duller and more important: it is tuned for long-horizon agentic work. Anthropic describes it as a model that plans a task, calls tools, checks its own outputs, and holds context across a long run, with a 1M-token context window and up to 128k output tokens. That last part, checks its own outputs, is the capability we are going to lean on in this guide.
By the end you will have a small, runnable Python agent that takes a messy data question, calls a tool to compute the real answer, and then runs a second pass where the model audits its own response before returning it. You will also learn how to detect the silent safeguard fallback, which is the single most surprising thing about building on Fable 5.
Prerequisites
- Python 3.10 or newer.
- An Anthropic API key with access to
claude-fable-5(set it as theANTHROPIC_API_KEYenvironment variable). - The official SDK:
pip install "anthropic>=0.40". - Basic comfort reading JSON and a willingness to watch a token bill (Fable 5 is $10 / 1M input and $50 / 1M output tokens).
What actually changed with Fable 5
Before writing code it helps to know what you are paying for. Fable 5 is not a new API shape, it is the standard Messages API with a stronger model behind it. The practical differences from the Opus line are about agentic reliability and the safety wrapper, not new endpoints.
| Property | Value |
|---|---|
| Model ID | claude-fable-5 |
| Context window | 1,000,000 tokens |
| Max output | 128,000 tokens |
| Pricing | $10 / 1M input, $50 / 1M output |
| Released | June 9, 2026 (public, safety-gated build of Mythos 5) |
| Safeguard | High-risk prompts silently fall back to Claude Opus 4.8 |
| API surface | Standard /v1/messages (tool use, streaming, system prompts) |
The safeguard line is not a footnote. When you send a request that trips the high-risk classifiers (Anthropic calls out cybersecurity and biology specifically), the platform may answer with Opus 4.8 rather than Fable 5, and the only signal you get is the model field on the response. If you are billing customers or logging which model did the work, you need to read that field on every response.
Step 1: Confirm you are really talking to Fable 5
Start with the smallest possible call and inspect the returned model string. This is your fallback detector from minute one.
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY
MODEL = "claude-fable-5"
resp = client.messages.create(
model=MODEL,
max_tokens=256,
messages=[{"role": "user", "content": "In one sentence, what is a long-horizon agent?"}],
)
print("served by:", resp.model)
print(resp.content[0].text)
Example output:
served by: claude-fable-5
A long-horizon agent is an AI system that pursues a multi-step goal over many
turns, repeatedly planning, calling tools, and checking its progress rather
than producing a single one-shot answer.
If resp.model ever comes back as something like claude-opus-4-8, the safeguard fired. Keep that check; we reuse it later.
Step 2: Give the agent a tool
An agent is just a model plus a loop plus tools. The Messages API tool-use protocol works in four beats: you declare tools, the model replies with stop_reason == "tool_use" and one or more tool_use blocks, you run the tool and send the result back as a tool_result block, and the model continues. Here we give Fable 5 a single tool that runs a snippet of Python so it can do exact arithmetic instead of guessing.
import anthropic
client = anthropic.Anthropic()
MODEL = "claude-fable-5"
TOOLS = [{
"name": "run_python",
"description": "Execute a short Python snippet and return whatever it prints. "
"Use this for any exact calculation instead of doing math in your head.",
"input_schema": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python source to execute."}
},
"required": ["code"],
},
}]
def run_python(code: str) -> str:
import io, contextlib
buf = io.StringIO()
try:
with contextlib.redirect_stdout(buf):
exec(code, {"__builtins__": __builtins__})
return buf.getvalue().strip() or "(no output)"
except Exception as e: # never crash the loop
return f"ERROR: {type(e).__name__}: {e}"
DISPATCH = {"run_python": lambda i: run_python(i["code"])}
def agent_turn(messages, max_steps=6):
"""Run the tool loop until the model stops asking for tools."""
for _ in range(max_steps):
resp = client.messages.create(
model=MODEL, max_tokens=1500, tools=TOOLS, messages=messages,
)
if resp.model != MODEL:
print(f"[warn] safeguard fallback -> {resp.model}")
if resp.stop_reason != "tool_use":
return resp # final answer
# 1. record what the model said (including its tool_use blocks)
messages.append({"role": "assistant", "content": resp.content})
# 2. run every requested tool and collect results
results = []
for block in resp.content:
if block.type == "tool_use":
out = DISPATCH[block.name](block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": out,
})
# 3. hand the results back to the model
messages.append({"role": "user", "content": results})
raise RuntimeError("Agent exceeded max_steps without finishing")
Note three things that save you pain later: the dispatch function never raises (tool errors come back as strings the model can read and recover from), we cap the loop with max_steps so a confused model can't spin forever, and we re-check resp.model on every hop because the fallback can trip mid-conversation, not just on the first call.
Step 3: Make it self-checking
This is the Fable-5-specific part. A normal agent stops at its first plausible answer. We add a second pass: feed the draft answer back and ask the model to act as an adversarial reviewer that returns structured JSON. If it finds problems, we let the agent take another turn to fix them. Because Fable 5 holds the full transcript in context, the reviewer sees the actual tool outputs, not just the prose.
import json
VERIFY_PROMPT = (
"You are a strict reviewer. Look at the conversation above and the draft "
"answer you just gave. Re-check every number against the tool_result blocks. "
"Reply with ONLY a JSON object: "
'{"ok": true|false, "issues": ["..."], "fix": "corrected answer or empty"}'."
)
def self_checking_agent(question, max_revisions=2):
messages = [{"role": "user", "content": question}]
answer = agent_turn(messages)
draft = answer.content[-1].text
for _ in range(max_revisions):
messages.append({"role": "assistant", "content": draft})
messages.append({"role": "user", "content": VERIFY_PROMPT})
review = client.messages.create(model=MODEL, max_tokens=800, messages=messages)
verdict = json.loads(review.content[0].text)
print("review:", verdict)
if verdict["ok"]:
return draft # passed its own audit
# failed: ask the agent to redo the work with the critique in hand
messages.append({"role": "assistant", "content": review.content[0].text})
messages.append({"role": "user",
"content": "Apply your own fixes and produce the corrected answer."})
answer = agent_turn(messages)
draft = answer.content[-1].text
return draft
The pattern is deliberately cheap: the verifier is a single non-tool call that returns a tiny JSON blob, so you only pay for a real revision loop when the model actually catches itself being wrong.
Worked example: auditing a metrics table
Let's give the agent a realistic, error-prone task: compute year-over-year growth from a small financial table and summarize it. Models are notorious for fumbling percentage math, which is exactly where the tool plus the self-check earn their keep.
QUESTION = """Here is our annual data:
Year | Revenue | Operating income
2024 | 98.7 | 12.4
2025 | 124.3 | 18.6
(all figures in $M)
Compute year-over-year growth for both metrics as percentages,
then tell me in one line whether margins improved. Use the tool for math."""
final = self_checking_agent(QUESTION)
print("\n=== FINAL ===\n" + final)
Example run (trimmed):
review: {'ok': True, 'issues': [], 'fix': ''}
=== FINAL ===
Revenue grew 25.9% YoY (98.7 -> 124.3) and operating income grew 50.0%
(12.4 -> 18.6). Operating margin rose from 12.6% to 15.0%, so yes, margins
improved by about 2.4 points.
Behind that output, the agent wrote and ran a run_python snippet to get 25.886... and 50.0, fed the result into its prose, and then the reviewer pass recomputed the margins (12.4/98.7 and 18.6/124.3) before signing off. If the first draft had said "margins were flat," the reviewer would have returned ok: false and forced a correction.
Common pitfalls and gotchas
- Missing tool_result blocks. Every
tool_useblock the model emits in one turn must get a matchingtool_resultwith the sametool_use_idin the very next user message. Skip one and the API returns a 400. If the model calls two tools at once, return two results. - The silent Opus 4.8 fallback. Restricted prompts (cyber, bio, and other high-risk areas) are answered by Opus 4.8 with no error. Always read
resp.model. Do not assert which model ran from the model ID you sent. - Forgetting to append the assistant turn. Before sending tool results you must append the assistant message containing the
tool_useblocks. A common bug is appending only the results, which corrupts the turn order. - No loop cap. A confused agent can request tools indefinitely. The
max_stepsguard turns a runaway bill into a clean exception. - Trusting the verifier's format.
json.loadswill throw if the reviewer wraps its JSON in prose or a Markdown fence. In production, strip code fences or use a tool/JSON-schema response so the format is guaranteed. - 1M context is a budget, not a free lunch. At $10 per million input tokens, stuffing a 900k-token transcript into every step gets expensive fast. Summarize or prune old tool outputs for long runs.
- exec() is for demos only. The
run_pythontool here runs arbitrary code in your process. For anything real, run tool code in a sandbox (a container, a microVM, or a hosted sandbox provider).
Quick reference
| Task | How |
|---|---|
| Call the model | client.messages.create(model="claude-fable-5", max_tokens=..., messages=...) |
| Detect tool request | resp.stop_reason == "tool_use" |
| Find tool calls | [b for b in resp.content if b.type == "tool_use"] |
| Return a tool result | {"type":"tool_result","tool_use_id":id,"content":str} |
| Detect safeguard fallback | resp.model != "claude-fable-5" |
| Get final text | resp.content[-1].text |
| Cap a runaway loop | for _ in range(max_steps): ... |
Next steps
- Swap the toy
run_pythontool for real ones: a file reader, a SQL query runner, or an MCP server, and keep the same loop. - Move tool execution into a sandbox before you let the agent touch anything that matters.
- Replace the JSON reviewer with a structured tool call so you never hit a parse error.
- Add streaming (
client.messages.stream) so long-horizon runs show progress instead of hanging. - Log
resp.model, token counts, and step counts per run so you can see the cost and catch fallbacks in aggregate.
Fable 5's pitch is reliability over many steps, and the cheapest way to cash that in is a self-check pass that catches the model's own mistakes before your users do. The loop above is about forty lines; everything else is just better tools.
Comments
Be the first to comment
Found this useful?
Get new AI guides for builders by email. Free.
Join 1,984 builders reading daily.