OpenAI Agents SDK Guardrails: Block Bad Input Fast — ContentBuffer guide

OpenAI Agents SDK Guardrails: Block Bad Input Fast

K
Kodetra Technologies··4 min read Intermediate

Summary

Use input guardrails and tripwires to stop bad input before your costly OpenAI agent runs.

Your customer-support agent runs on a smart, slow, expensive model. A user pastes in their math homework, or tries a prompt-injection, and you pay full token price to answer something you never wanted to answer. Guardrails in the OpenAI Agents SDK fix this: a cheap, fast check screens input and output, and a tripwire halts the run the moment something looks wrong.

This guide shows you how to add input, output, and tool guardrails in Python — and the one setting (run_in_parallel=False) that decides whether you actually save money or just fail after the bill is already racked up.

Prerequisites

  • Python 3.9+ and an OPENAI_API_KEY in your environment
  • Basic async/await familiarity
  • A few minutes — the whole thing fits in one file

Step 1 — Install

pip install openai-agents pydantic
export OPENAI_API_KEY=sk-...

Step 2 — Write an input guardrail

A guardrail is just a function that receives the same input as your agent and returns a GuardrailFunctionOutput. The trick most people use: run a tiny, cheap agent inside the guardrail to classify the input, then flip the tripwire on its verdict.

from pydantic import BaseModel
from agents import (
    Agent, GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    RunContextWrapper, Runner, TResponseInputItem, input_guardrail,
)

class HomeworkCheck(BaseModel):
    is_math_homework: bool
    reasoning: str

# Cheap, fast model just for screening
guard_agent = Agent(
    name="Guard",
    model="gpt-4o-mini",
    instructions="Is the user asking you to do their math homework?",
    output_type=HomeworkCheck,
)

@input_guardrail
async def math_guardrail(
    ctx: RunContextWrapper[None], agent: Agent,
    input: str | list[TResponseInputItem],
) -> GuardrailFunctionOutput:
    result = await Runner.run(guard_agent, input, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_math_homework,
    )

Step 3 — Attach it and catch the tripwire

Guardrails live on the agent, not on Runner.run — different agents need different checks, so colocating them keeps things readable. When the tripwire fires, the SDK raises InputGuardrailTripwireTriggered instead of returning a result.

import asyncio

agent = Agent(
    name="Support",
    model="gpt-4o",  # the expensive one
    instructions="You are a helpful customer support agent.",
    input_guardrails=[math_guardrail],
)

async def main():
    try:
        res = await Runner.run(agent, "Solve for x: 2x + 3 = 11")
        print(res.final_output)
    except InputGuardrailTripwireTriggered:
        print("Blocked: that looks like math homework.")

asyncio.run(main())

Example output:

Blocked: that looks like math homework.

Step 4 — The setting that actually saves money

Input guardrails run in parallel by default (run_in_parallel=True): the guardrail and the real agent start at the same time for the best latency. The catch — if the guardrail trips, your expensive agent may have already consumed tokens and fired tool calls before being cancelled.

If your goal is cost control or avoiding side effects, switch to blocking mode. The guardrail finishes first; the agent never starts unless the input passes.

from agents import InputGuardrail

agent = Agent(
    name="Support",
    model="gpt-4o",
    instructions="You are a helpful customer support agent.",
    input_guardrails=[
        InputGuardrail(math_guardrail, run_in_parallel=False),  # block first
    ],
)

Step 5 — Guard the output too

Output guardrails run on the final answer — useful for catching leaked PII, off-policy content, or a model that quietly did the math anyway. They always run after the agent completes, so there is no run_in_parallel option here.

from agents import output_guardrail, OutputGuardrailTripwireTriggered

class MessageOutput(BaseModel):
    response: str

@output_guardrail
async def no_math_output(
    ctx: RunContextWrapper, agent: Agent, output: MessageOutput,
) -> GuardrailFunctionOutput:
    result = await Runner.run(guard_agent, output.response, context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_math_homework,
    )

agent = Agent(
    name="Support", model="gpt-4o",
    instructions="Help customers.",
    output_guardrails=[no_math_output],
    output_type=MessageOutput,
)

Step 6 — Validate individual tool calls

Agent-level guardrails only fire for the first/last agent. If a manager hands off to specialists, wrap the risky function_tool itself. Tool guardrails can allow(), replace the result, or reject the call before it runs.

import json
from agents import (
    function_tool, tool_input_guardrail,
    ToolGuardrailFunctionOutput,
)

@tool_input_guardrail
def block_secrets(data):
    args = json.loads(data.context.tool_arguments or "{}")
    if "sk-" in json.dumps(args):
        return ToolGuardrailFunctionOutput.reject_content(
            "Remove secrets before calling this tool."
        )
    return ToolGuardrailFunctionOutput.allow()

@function_tool(tool_input_guardrails=[block_secrets])
def classify_text(text: str) -> str:
    """Classify text for routing."""
    return f"length:{len(text)}"

Common pitfalls

  • Expecting parallel mode to save tokens. It does not — the agent already started. Use run_in_parallel=False when cost or side effects matter.
  • Putting input guardrails on a mid-chain agent. Input guardrails only run on the first agent; output guardrails only on the last. Use tool guardrails for the steps in between.
  • Using your expensive model for the check. Defeats the purpose — point the guard agent at a small, cheap model like gpt-4o-mini.
  • Forgetting to catch the exception. A tripwire raises; if you do not wrap Runner.run in try/except, it crashes your handler.

Quick reference

Guardrail typeRuns whenTripwire exception
InputBefore first agent (or parallel)InputGuardrailTripwireTriggered
OutputAfter last agent completesOutputGuardrailTripwireTriggered
Tool inputBefore each function_tool callToolGuardrailFunctionOutput.reject_content
Tool outputAfter each function_tool callToolGuardrailFunctionOutput.reject_content

Next steps

Add a guardrail for prompt-injection patterns, log every output_info for audit, and set run_in_parallel=False on anything that touches money or external systems. Start with one cheap input check on your most expensive agent — it is the highest-leverage line of safety code you will write this week.

Comments

Subscribe to join the conversation...

Be the first to comment