
OpenAI Agents SDK Guardrails: Block Bad Input Fast
Summary
Use input guardrails and tripwires to stop bad input before your costly OpenAI agent runs.
Your customer-support agent runs on a smart, slow, expensive model. A user pastes in their math homework, or tries a prompt-injection, and you pay full token price to answer something you never wanted to answer. Guardrails in the OpenAI Agents SDK fix this: a cheap, fast check screens input and output, and a tripwire halts the run the moment something looks wrong.
This guide shows you how to add input, output, and tool guardrails in Python — and the one setting (run_in_parallel=False) that decides whether you actually save money or just fail after the bill is already racked up.
Prerequisites
- Python 3.9+ and an
OPENAI_API_KEYin your environment - Basic
async/awaitfamiliarity - A few minutes — the whole thing fits in one file
Step 1 — Install
pip install openai-agents pydantic
export OPENAI_API_KEY=sk-...
Step 2 — Write an input guardrail
A guardrail is just a function that receives the same input as your agent and returns a GuardrailFunctionOutput. The trick most people use: run a tiny, cheap agent inside the guardrail to classify the input, then flip the tripwire on its verdict.
from pydantic import BaseModel
from agents import (
Agent, GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
RunContextWrapper, Runner, TResponseInputItem, input_guardrail,
)
class HomeworkCheck(BaseModel):
is_math_homework: bool
reasoning: str
# Cheap, fast model just for screening
guard_agent = Agent(
name="Guard",
model="gpt-4o-mini",
instructions="Is the user asking you to do their math homework?",
output_type=HomeworkCheck,
)
@input_guardrail
async def math_guardrail(
ctx: RunContextWrapper[None], agent: Agent,
input: str | list[TResponseInputItem],
) -> GuardrailFunctionOutput:
result = await Runner.run(guard_agent, input, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.is_math_homework,
)
Step 3 — Attach it and catch the tripwire
Guardrails live on the agent, not on Runner.run — different agents need different checks, so colocating them keeps things readable. When the tripwire fires, the SDK raises InputGuardrailTripwireTriggered instead of returning a result.
import asyncio
agent = Agent(
name="Support",
model="gpt-4o", # the expensive one
instructions="You are a helpful customer support agent.",
input_guardrails=[math_guardrail],
)
async def main():
try:
res = await Runner.run(agent, "Solve for x: 2x + 3 = 11")
print(res.final_output)
except InputGuardrailTripwireTriggered:
print("Blocked: that looks like math homework.")
asyncio.run(main())
Example output:
Blocked: that looks like math homework.
Step 4 — The setting that actually saves money
Input guardrails run in parallel by default (run_in_parallel=True): the guardrail and the real agent start at the same time for the best latency. The catch — if the guardrail trips, your expensive agent may have already consumed tokens and fired tool calls before being cancelled.
If your goal is cost control or avoiding side effects, switch to blocking mode. The guardrail finishes first; the agent never starts unless the input passes.
from agents import InputGuardrail
agent = Agent(
name="Support",
model="gpt-4o",
instructions="You are a helpful customer support agent.",
input_guardrails=[
InputGuardrail(math_guardrail, run_in_parallel=False), # block first
],
)
Step 5 — Guard the output too
Output guardrails run on the final answer — useful for catching leaked PII, off-policy content, or a model that quietly did the math anyway. They always run after the agent completes, so there is no run_in_parallel option here.
from agents import output_guardrail, OutputGuardrailTripwireTriggered
class MessageOutput(BaseModel):
response: str
@output_guardrail
async def no_math_output(
ctx: RunContextWrapper, agent: Agent, output: MessageOutput,
) -> GuardrailFunctionOutput:
result = await Runner.run(guard_agent, output.response, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.is_math_homework,
)
agent = Agent(
name="Support", model="gpt-4o",
instructions="Help customers.",
output_guardrails=[no_math_output],
output_type=MessageOutput,
)
Step 6 — Validate individual tool calls
Agent-level guardrails only fire for the first/last agent. If a manager hands off to specialists, wrap the risky function_tool itself. Tool guardrails can allow(), replace the result, or reject the call before it runs.
import json
from agents import (
function_tool, tool_input_guardrail,
ToolGuardrailFunctionOutput,
)
@tool_input_guardrail
def block_secrets(data):
args = json.loads(data.context.tool_arguments or "{}")
if "sk-" in json.dumps(args):
return ToolGuardrailFunctionOutput.reject_content(
"Remove secrets before calling this tool."
)
return ToolGuardrailFunctionOutput.allow()
@function_tool(tool_input_guardrails=[block_secrets])
def classify_text(text: str) -> str:
"""Classify text for routing."""
return f"length:{len(text)}"
Common pitfalls
- Expecting parallel mode to save tokens. It does not — the agent already started. Use
run_in_parallel=Falsewhen cost or side effects matter. - Putting input guardrails on a mid-chain agent. Input guardrails only run on the first agent; output guardrails only on the last. Use tool guardrails for the steps in between.
- Using your expensive model for the check. Defeats the purpose — point the guard agent at a small, cheap model like
gpt-4o-mini. - Forgetting to catch the exception. A tripwire raises; if you do not wrap
Runner.runin try/except, it crashes your handler.
Quick reference
| Guardrail type | Runs when | Tripwire exception |
|---|---|---|
| Input | Before first agent (or parallel) | InputGuardrailTripwireTriggered |
| Output | After last agent completes | OutputGuardrailTripwireTriggered |
| Tool input | Before each function_tool call | ToolGuardrailFunctionOutput.reject_content |
| Tool output | After each function_tool call | ToolGuardrailFunctionOutput.reject_content |
Next steps
Add a guardrail for prompt-injection patterns, log every output_info for audit, and set run_in_parallel=False on anything that touches money or external systems. Start with one cheap input check on your most expensive agent — it is the highest-leverage line of safety code you will write this week.
Comments
Be the first to comment