
GPT-5.4 Tool Search: Load Only What You Need
Summary
Use OpenAI's tool search to dynamically load tools at runtime, cutting token usage by 47% in large tool ecosystems.
What Is GPT-5.4 Tool Search?
When you build AI agents with 50+ tools, every tool definition eats context tokens. GPT-5.4 introduces tool search — the model loads only the 3–8 tools relevant to the current request instead of all definitions upfront. Result: 47% fewer tokens, faster responses, and better tool selection accuracy.
Only gpt-5.4 and later models support this feature. It works with both functions and MCP servers.
Prerequisites
- OpenAI API key with GPT-5.4 access
- Python 3.9+ with
openaiSDK installed - Basic understanding of OpenAI function calling
pip install openai --upgrade
How Tool Search Works
Without tool search, every tool schema is loaded into the prompt. With tool search, you mark tools as defer_loading: true and add a tool_search entry. The model then searches for relevant tools at runtime.
| Feature | Without Tool Search | With Tool Search |
|---|---|---|
| Token usage (50 tools) | ~12K tokens | ~6.3K tokens |
| Tool selection accuracy | Degrades with scale | Consistent |
| Response latency | Higher | Lower (cache preserved) |
| Setup complexity | None | Minimal |
Option 1: Hosted Tool Search (Easiest)
OpenAI handles the search logic. You declare all tools upfront but mark them as deferred. The API decides which ones to load.
Step 1 — Define Your Tools with defer_loading
import openai
client = openai.OpenAI()
tools = [
# Tool search entry — tells the model to search
{"type": "tool_search", "execution": "server"},
# Deferred tool — NOT loaded until searched
{
"type": "function",
"defer_loading": True,
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
},
{
"type": "function",
"defer_loading": True,
"function": {
"name": "search_flights",
"description": "Search available flights between cities",
"parameters": {
"type": "object",
"properties": {
"origin": {"type": "string"},
"destination": {"type": "string"},
"date": {"type": "string"}
},
"required": ["origin", "destination", "date"]
}
}
},
{
"type": "function",
"defer_loading": True,
"function": {
"name": "book_hotel",
"description": "Book a hotel room in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"checkin": {"type": "string"},
"checkout": {"type": "string"}
},
"required": ["city", "checkin", "checkout"]
}
}
}
]
Step 2 — Make the API Call
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "What's the weather in Tokyo?"}
],
tools=tools
)
print(response.choices[0].message)
Step 3 — Inspect the Response
The model first emits a tool_search_call (internal search step), then a tool_search_output (loaded tools), then the actual function call. You only need to handle the function call:
# Example output — model loaded only get_weather
# {
# "role": "assistant",
# "tool_calls": [
# {
# "id": "call_abc123",
# "type": "function",
# "function": {
# "name": "get_weather",
# "arguments": "{\"city\": \"Tokyo\"}"
# }
# }
# ]
# }
# search_flights and book_hotel were NOT loaded — saving tokens
Option 2: Client-Executed Tool Search
You control the search logic. The model tells you what it needs; your code decides which tools to provide. Use this when tools depend on user context, tenant config, or external registries.
Step 1 — Configure Client-Side Search
tools_client = [
{
"type": "tool_search",
"execution": "client",
"description": "Search project-specific tools",
"parameters": {
"type": "object",
"properties": {
"goal": {"type": "string"}
},
"required": ["goal"]
}
}
]
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "Deploy the staging server"}
],
tools=tools_client
)
Step 2 — Handle the Search Call
import json
# Model returns a tool_search_call with a call_id
search_call = response.choices[0].message.tool_calls[0]
goal = json.loads(search_call.function.arguments)["goal"]
print(f"Model wants tools for: {goal}")
# Output: "Model wants tools for: deploy staging server"
# Your logic: look up relevant tools from your registry
def find_tools(goal):
registry = {
"deploy": [
{
"type": "function",
"function": {
"name": "run_deploy",
"description": "Deploy to staging or production",
"parameters": {
"type": "object",
"properties": {
"env": {
"type": "string",
"enum": ["staging", "production"]
}
},
"required": ["env"]
}
}
}
]
}
for key, tools in registry.items():
if key in goal.lower():
return tools
return []
matched_tools = find_tools(goal)
Step 3 — Return Tools and Continue
# Send matched tools back with the same call_id
follow_up = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "user", "content": "Deploy the staging server"},
response.choices[0].message,
{
"role": "tool",
"tool_call_id": search_call.id,
"content": json.dumps({
"type": "tool_search_output",
"tools": matched_tools
})
}
],
tools=tools_client + matched_tools
)
# Model now calls run_deploy with env="staging"
print(follow_up.choices[0].message.tool_calls)
Best Practices
- Group tools into namespaces — Use MCP servers or logical groups instead of individual deferred functions for better token efficiency
- Keep namespaces under 10 functions — Smaller groups improve search accuracy
- Write clear descriptions — The model uses descriptions to decide which tools to load, so vague ones cause poor matches
- Don't mix deferred and non-deferred carelessly — Keep always-needed tools non-deferred and optional tools deferred
- Cache is preserved — Loaded tools append to context end, so KV cache stays intact across requests
When to Use Each Approach
| Scenario | Approach |
|---|---|
| All tools known at request time | Hosted (server) |
| Tools depend on user/tenant context | Client-executed |
| External tool registry (MCP, plugin store) | Client-executed |
| Simple agent with 10-50 tools | Hosted (server) |
| Platform with 100+ tools per user | Client-executed |
Quick Reference
# Minimal hosted tool search setup
tools = [
{"type": "tool_search", "execution": "server"},
{"type": "function", "defer_loading": True, "function": {...}},
{"type": "function", "defer_loading": True, "function": {...}},
]
# Minimal client tool search setup
tools = [
{
"type": "tool_search",
"execution": "client",
"description": "Search available tools",
"parameters": {
"type": "object",
"properties": {"goal": {"type": "string"}}
}
}
]
Key Takeaway
Tool search is essential for any agent with more than a dozen tools. Hosted search is the simplest starting point — just add defer_loading: true and a tool_search entry. Switch to client-executed search when you need dynamic, context-aware tool discovery. Either way, you get significant token savings and better tool selection out of the box.
Comments
Be the first to comment