GPT-5.4 Tool Search: Load Only What You Need

GPT-5.4 Tool Search: Load Only What You Need

K
Kodetra Technologies·April 14, 2026·4 min read Beginner

Summary

Use OpenAI's tool search to dynamically load tools at runtime, cutting token usage by 47% in large tool ecosystems.

What Is GPT-5.4 Tool Search?

When you build AI agents with 50+ tools, every tool definition eats context tokens. GPT-5.4 introduces tool search — the model loads only the 3–8 tools relevant to the current request instead of all definitions upfront. Result: 47% fewer tokens, faster responses, and better tool selection accuracy.

Only gpt-5.4 and later models support this feature. It works with both functions and MCP servers.


Prerequisites

  • OpenAI API key with GPT-5.4 access
  • Python 3.9+ with openai SDK installed
  • Basic understanding of OpenAI function calling
pip install openai --upgrade

How Tool Search Works

Without tool search, every tool schema is loaded into the prompt. With tool search, you mark tools as defer_loading: true and add a tool_search entry. The model then searches for relevant tools at runtime.

FeatureWithout Tool SearchWith Tool Search
Token usage (50 tools)~12K tokens~6.3K tokens
Tool selection accuracyDegrades with scaleConsistent
Response latencyHigherLower (cache preserved)
Setup complexityNoneMinimal

Option 1: Hosted Tool Search (Easiest)

OpenAI handles the search logic. You declare all tools upfront but mark them as deferred. The API decides which ones to load.

Step 1 — Define Your Tools with defer_loading

import openai

client = openai.OpenAI()

tools = [
    # Tool search entry — tells the model to search
    {"type": "tool_search", "execution": "server"},

    # Deferred tool — NOT loaded until searched
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "search_flights",
            "description": "Search available flights between cities",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                    "date": {"type": "string"}
                },
                "required": ["origin", "destination", "date"]
            }
        }
    },
    {
        "type": "function",
        "defer_loading": True,
        "function": {
            "name": "book_hotel",
            "description": "Book a hotel room in a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "checkin": {"type": "string"},
                    "checkout": {"type": "string"}
                },
                "required": ["city", "checkin", "checkout"]
            }
        }
    }
]

Step 2 — Make the API Call

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo?"}
    ],
    tools=tools
)

print(response.choices[0].message)

Step 3 — Inspect the Response

The model first emits a tool_search_call (internal search step), then a tool_search_output (loaded tools), then the actual function call. You only need to handle the function call:

# Example output — model loaded only get_weather
# {
#   "role": "assistant",
#   "tool_calls": [
#     {
#       "id": "call_abc123",
#       "type": "function",
#       "function": {
#         "name": "get_weather",
#         "arguments": "{\"city\": \"Tokyo\"}"
#       }
#     }
#   ]
# }
# search_flights and book_hotel were NOT loaded — saving tokens

Option 2: Client-Executed Tool Search

You control the search logic. The model tells you what it needs; your code decides which tools to provide. Use this when tools depend on user context, tenant config, or external registries.

Step 1 — Configure Client-Side Search

tools_client = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search project-specific tools",
        "parameters": {
            "type": "object",
            "properties": {
                "goal": {"type": "string"}
            },
            "required": ["goal"]
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"}
    ],
    tools=tools_client
)

Step 2 — Handle the Search Call

import json

# Model returns a tool_search_call with a call_id
search_call = response.choices[0].message.tool_calls[0]
goal = json.loads(search_call.function.arguments)["goal"]
print(f"Model wants tools for: {goal}")
# Output: "Model wants tools for: deploy staging server"

# Your logic: look up relevant tools from your registry
def find_tools(goal):
    registry = {
        "deploy": [
            {
                "type": "function",
                "function": {
                    "name": "run_deploy",
                    "description": "Deploy to staging or production",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "env": {
                                "type": "string",
                                "enum": ["staging", "production"]
                            }
                        },
                        "required": ["env"]
                    }
                }
            }
        ]
    }
    for key, tools in registry.items():
        if key in goal.lower():
            return tools
    return []

matched_tools = find_tools(goal)

Step 3 — Return Tools and Continue

# Send matched tools back with the same call_id
follow_up = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "user", "content": "Deploy the staging server"},
        response.choices[0].message,
        {
            "role": "tool",
            "tool_call_id": search_call.id,
            "content": json.dumps({
                "type": "tool_search_output",
                "tools": matched_tools
            })
        }
    ],
    tools=tools_client + matched_tools
)

# Model now calls run_deploy with env="staging"
print(follow_up.choices[0].message.tool_calls)

Best Practices

  1. Group tools into namespaces — Use MCP servers or logical groups instead of individual deferred functions for better token efficiency
  2. Keep namespaces under 10 functions — Smaller groups improve search accuracy
  3. Write clear descriptions — The model uses descriptions to decide which tools to load, so vague ones cause poor matches
  4. Don't mix deferred and non-deferred carelessly — Keep always-needed tools non-deferred and optional tools deferred
  5. Cache is preserved — Loaded tools append to context end, so KV cache stays intact across requests

When to Use Each Approach

ScenarioApproach
All tools known at request timeHosted (server)
Tools depend on user/tenant contextClient-executed
External tool registry (MCP, plugin store)Client-executed
Simple agent with 10-50 toolsHosted (server)
Platform with 100+ tools per userClient-executed

Quick Reference

# Minimal hosted tool search setup
tools = [
    {"type": "tool_search", "execution": "server"},
    {"type": "function", "defer_loading": True, "function": {...}},
    {"type": "function", "defer_loading": True, "function": {...}},
]

# Minimal client tool search setup
tools = [
    {
        "type": "tool_search",
        "execution": "client",
        "description": "Search available tools",
        "parameters": {
            "type": "object",
            "properties": {"goal": {"type": "string"}}
        }
    }
]

Key Takeaway

Tool search is essential for any agent with more than a dozen tools. Hosted search is the simplest starting point — just add defer_loading: true and a tool_search entry. Switch to client-executed search when you need dynamic, context-aware tool discovery. Either way, you get significant token savings and better tool selection out of the box.

Comments

Subscribe to join the conversation...

Be the first to comment