GPT-5.4 Tool Search: Load Only What You Need

What Is GPT-5.4 Tool Search?

When you build AI agents with 50+ tools, every tool definition eats context tokens. GPT-5.4 introduces tool search — the model loads only the 3–8 tools relevant to the current request instead of all definitions upfront. Result: 47% fewer tokens, faster responses, and better tool selection accuracy.

Only gpt-5.4 and later models support this feature. It works with both functions and MCP servers.

Prerequisites

OpenAI API key with GPT-5.4 access
Python 3.9+ with openai SDK installed
Basic understanding of OpenAI function calling

pip install openai --upgrade

How Tool Search Works

Without tool search, every tool schema is loaded into the prompt. With tool search, you mark tools as defer_loading: true and add a tool_search entry. The model then searches for relevant tools at runtime.

Feature	Without Tool Search	With Tool Search
Token usage (50 tools)	~12K tokens	~6.3K tokens
Tool selection accuracy	Degrades with scale	Consistent
Response latency	Higher	Lower (cache preserved)
Setup complexity	None	Minimal

Option 1: Hosted Tool Search (Easiest)

OpenAI handles the search logic. You declare all tools upfront but mark them as deferred. The API decides which ones to load.

Step 1 — Define Your Tools with defer_loading

import openai

client = openai.OpenAI()

tools = [
    # Tool search entry — tells the model to search
    {"type": "tool_search", "execution": "server"},

    # Deferred tool — NOT loaded until searched
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "search_flights",
            "description": "Search available flights between cities",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                    "date": {"type": "string"}
                },
                "required": ["origin", "destination", "date"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "book_hotel",
            "description": "Book a hotel room in a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "checkin": {"type": "string"},
                    "checkout": {"type": "string"}
                },
                "required": ["city", "checkin", "checkout"]
            }
        }
    }
]

Step 2 — Make the API Call

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=tools
)

print(response.choices[0].message)

Step 3 — Inspect the Response

The model first emits a tool_search_call (internal search step), then a tool_search_output (loaded tools), then the actual function call. You only need to handle the function call:

# Example output — model loaded only get_weather
# {
#   "role": "assistant",
#   "tool_calls": [
#     {
#       "id": "call_abc123",
#       "type": "function",
#       "function": {
#         "name": "get_weather",
#         "arguments": "{\"city\": \"Tokyo\"}"
#       }
#     }
#   ]
# }
# search_flights and book_hotel were NOT loaded — saving tokens

Option 2: Client-Executed Tool Search

You control the search logic. The model tells you what it needs; your code decides which tools to provide. Use this when tools depend on user context, tenant config, or external registries.

Step 1 — Configure Client-Side Search

tools_client = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search project-specific tools",
        "parameters": {
            "type": "object",
            "properties": {
                "goal": {"type": "string"}
            },
            "required": ["goal"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"}
    ],
    tools=tools_client
)

Step 2 — Handle the Search Call

import json

# Model returns a tool_search_call with a call_id
search_call = response.choices[0].message.tool_calls[0]
goal = json.loads(search_call.function.arguments)["goal"]
print(f"Model wants tools for: {goal}")
# Output: "Model wants tools for: deploy staging server"

# Your logic: look up relevant tools from your registry
def find_tools(goal):
    registry = {
        "deploy": [
            {
                "type": "function",
                "function": {
                    "name": "run_deploy",
                    "description": "Deploy to staging or production",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "env": {
                                "type": "string",
                                "enum": ["staging", "production"]
                            }
                        },
                        "required": ["env"]
                    }
                }
            }
        ]
    }
    for key, tools in registry.items():
        if key in goal.lower():
            return tools
    return []

matched_tools = find_tools(goal)

Step 3 — Return Tools and Continue

# Send matched tools back with the same call_id
follow_up = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"},
        response.choices[0].message,
        {
            "role": "tool",
            "tool_call_id": search_call.id,
            "content": json.dumps({
                "type": "tool_search_output",
                "tools": matched_tools
            })
        }
    ],
    tools=tools_client + matched_tools
)

# Model now calls run_deploy with env="staging"
print(follow_up.choices[0].message.tool_calls)

Best Practices

Group tools into namespaces — Use MCP servers or logical groups instead of individual deferred functions for better token efficiency
Keep namespaces under 10 functions — Smaller groups improve search accuracy
Write clear descriptions — The model uses descriptions to decide which tools to load, so vague ones cause poor matches
Don't mix deferred and non-deferred carelessly — Keep always-needed tools non-deferred and optional tools deferred
Cache is preserved — Loaded tools append to context end, so KV cache stays intact across requests

When to Use Each Approach

Scenario	Approach
All tools known at request time	Hosted (server)
Tools depend on user/tenant context	Client-executed
External tool registry (MCP, plugin store)	Client-executed
Simple agent with 10-50 tools	Hosted (server)
Platform with 100+ tools per user	Client-executed

Quick Reference

# Minimal hosted tool search setup
tools = [
    {"type": "tool_search", "execution": "server"},
    {"type": "function", "defer_loading": True, "function": {...}},
    {"type": "function", "defer_loading": True, "function": {...}},
]

# Minimal client tool search setup
tools = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search available tools",
        "parameters": {
            "type": "object",
            "properties": {"goal": {"type": "string"}}
        }
    }
]

Key Takeaway

Tool search is essential for any agent with more than a dozen tools. Hosted search is the simplest starting point — just add defer_loading: true and a tool_search entry. Switch to client-executed search when you need dynamic, context-aware tool discovery. Either way, you get significant token savings and better tool selection out of the box.