
Gemini 3.5 Flash Function Calling: Build a Tool-Using Agent
Summary
Wire Gemini 3.5 Flash to your own Python functions and run a real multi-step agent loop.
On May 19, 2026 at Google I/O, Google shipped Gemini 3.5 Flash straight to general availability — no preview tag. The headline number was speed (~4x the output tokens per second of comparable frontier models), but the more interesting line in the docs was the positioning: this is a model built for the agentic era, tuned for sub-agent deployment, rapid tool-use loops, and long-horizon multi-step work.
A fast model is only useful as an agent if it can do things — call your code, read the result, and decide what to do next. That capability is function calling. In this guide you'll wire Gemini 3.5 Flash to two real Python functions and build a working agent loop that can chain multiple tool calls in a single turn, feed results back, and produce a grounded final answer.
Everything here is verified against the official Gemini API docs for the 3.5 release. By the end you'll have a ~70-line agent you can drop your own tools into.
Prerequisites
- Python 3.9+ and
pip. - A Gemini API key from Google AI Studio (the free tier is enough to follow along).
- The official SDK:
pip install -U google-genai(note the dash; the package isgoogle-genai, you import it asgoogle.genai). - Basic comfort with Python functions and dictionaries. No ML background needed.
pip install -U google-genai
export GEMINI_API_KEY="your_key_here"
The SDK reads GEMINI_API_KEY from the environment automatically, so genai.Client() takes no arguments.
Step 1 — A baseline call (no tools yet)
Start with the simplest possible request so you know the install and key work. The model ID is gemini-3.5-flash.
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="In one sentence, what makes a model good at tool use?",
)
print(response.text)
Example output:
A model is good at tool use when it reliably decides *which* function to\ncall, fills the arguments correctly from context, and knows when to stop\ncalling tools and answer.
One thing to internalize early: for all Gemini 3.x models, Google now recommends removing temperature, top_p, and top_k. The reasoning stack is tuned for the defaults, and overriding them tends to hurt. If you need determinism, write an explicit system instruction instead of dropping the temperature.
Step 2 — How function calling actually works
Function calling does not mean the model runs your code. The API is stateless and the model never executes anything. Instead the cycle looks like this:
- You send the user's message plus a list of function declarations (name, description, JSON-schema parameters).
- If the model decides a tool is needed, it replies with one or more
function_callparts containing the function name and arguments — not a text answer. - Your code executes the matching Python function and captures the result.
- You send the result back as a
function_response, and the model produces its next step: another tool call, or a final text answer.
That last loop — call, execute, return, repeat — is the whole game. A single user request often triggers several rounds before the model is ready to answer.
Step 3 — Declare your tools
A function declaration is a plain dict using a subset of OpenAPI schema. Give each function a descriptive name (no spaces), a description the model can reason about, and typed parameters. The description is doing real work — it's how the model decides when to reach for the tool, so be specific and add example values.
from google.genai import types
# The real implementations (stubbed here with canned data).
WEATHER_DB = {
"Tokyo": {"forecast": "rain, 14-18C", "rain_chance": 80},
"Lisbon": {"forecast": "sunny, 22-27C", "rain_chance": 5},
}
RATES = {"USD": 1.0, "JPY": 156.3, "EUR": 0.92}
def get_weather(city: str) -> dict:
"""Return a short forecast for a city."""
return WEATHER_DB.get(city, {"forecast": "unknown", "rain_chance": 0})
def convert_currency(amount: float, from_code: str, to_code: str) -> dict:
"""Convert an amount between two ISO currency codes."""
usd = amount / RATES[from_code]
return {"converted": round(usd * RATES[to_code], 2), "to": to_code}
TOOL_IMPLS = {"get_weather": get_weather, "convert_currency": convert_currency}
get_weather_decl = {
"name": "get_weather",
"description": "Get the current forecast and rain chance for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'Tokyo'."}
},
"required": ["city"],
},
}
convert_currency_decl = {
"name": "convert_currency",
"description": "Convert money from one ISO currency code to another.",
"parameters": {
"type": "object",
"properties": {
"amount": {"type": "number", "description": "Amount to convert."},
"from_code": {"type": "string", "description": "Source ISO code, e.g. 'USD'."},
"to_code": {"type": "string", "description": "Target ISO code, e.g. 'JPY'."},
},
"required": ["amount", "from_code", "to_code"],
},
}
The TOOL_IMPLS dict is a small but important pattern: it maps the declared name to the real callable so your loop can dispatch by name without a pile of if statements.
Step 4 — Register the tools and detect a call
Wrap the declarations in a types.Tool and pass it through GenerateContentConfig. This is also where you set thinking_level — more on that below. After a response comes back, the convenient response.function_calls accessor returns a list of any calls the model made.
client = genai.Client()
tools = types.Tool(
function_declarations=[get_weather_decl, convert_currency_decl]
)
config = types.GenerateContentConfig(
tools=[tools],
thinking_config=types.ThinkingConfig(thinking_level="low"),
)
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="What's the weather in Tokyo right now?",
config=config,
)
for call in response.function_calls:
print(call.name, dict(call.args), "id=", call.id)
# -> get_weather {'city': 'Tokyo'} id= a1b2c3...
Notice the id on each call. Hold onto it — Gemini 3.x requires you to echo that exact id back when you return the result, or the model silently returns an empty response. This is the single most common thing people get wrong after migrating from 2.5.
Step 5 — The full agent loop
Now assemble the cycle into a reusable loop. The structure: send the conversation, check for tool calls, execute every call the model requested, append both the model's turn and your results to the running contents list, and repeat until the model answers with text instead of a tool call. A max_steps cap stops runaway loops.
def run_agent(user_message: str, max_steps: int = 6) -> str:
contents = [types.Content(role="user",
parts=[types.Part(text=user_message)])]
for step in range(max_steps):
response = client.models.generate_content(
model="gemini-3.5-flash", contents=contents, config=config
)
calls = response.function_calls
if not calls: # model is done -> final answer
return response.text
# Keep the model's turn (incl. its thought signatures) in history.
contents.append(response.candidates[0].content)
tool_parts = []
for call in calls:
impl = TOOL_IMPLS[call.name] # dispatch by name
result = impl(**call.args)
print(f" [tool] {call.name}({dict(call.args)}) -> {result}")
tool_parts.append(
types.Part.from_function_response(
name=call.name, # must match the call
response={"result": result},
id=call.id, # must echo the call id
)
)
# Return ALL results in one user turn (one response per call).
contents.append(types.Content(role="user", parts=tool_parts))
return "Stopped: hit max_steps without a final answer."
Three details are load-bearing here. First, you append response.candidates[0].content unchanged — that carries the model's internal reasoning context (thought signatures) forward, which Gemini 3.5 preserves automatically across turns to stay coherent on multi-step tasks. Second, every function_response echoes both the name and the id of the call it answers. Third, if the model asks for two tools at once, you must return exactly two responses — one per call, no more, no fewer.
Worked example — a one-shot travel assistant
Here's the payoff. Ask a question that needs both tools, and watch the model fire them in parallel, then synthesize.
print(run_agent(
"I'm flying to Tokyo on Friday. What should I pack, "
"and how much is 500 USD in Japanese yen?"
))
Actual run (tool lines are the print inside the loop):
[tool] get_weather({'city': 'Tokyo'}) -> {'forecast': 'rain, 14-18C', 'rain_chance': 80}
[tool] convert_currency({'amount': 500, 'from_code': 'USD', 'to_code': 'JPY'}) -> {'converted': 78150.0, 'to': 'JPY'}
Tokyo looks rainy on Friday (14-18C, 80% chance of rain), so pack a
waterproof jacket, an umbrella, and layers for the cool evenings.
500 USD is about 78,150 JPY at the current rate.
The model decided on its own that the request needed two different functions, extracted 500, USD and JPY from plain English, called both in a single turn, and folded the structured results into a natural answer. You wrote the tools; the model wrote the orchestration.
Tuning thinking_level for agents
Gemini 3.5 Flash thinks before it acts, and you control how hard via thinking_level — a string enum that replaces the old numeric thinking_budget (which is no longer recommended). Higher levels make the model explore and verify more, which also means more tool calls. For tight agent loops that's often the wrong trade.
| thinking_level | Best for |
|---|---|
| minimal | Chat replies, quick facts, trivial single tool calls. Fastest. |
| low | Agentic loops and code that need few steps. Great default for tool-using agents. |
| medium (default) | Best overall quality; complex code and multi-step agent work. |
| high | Hard reasoning, tricky math, the most difficult agent tasks. Most tool calls. |
The default is now medium (it was high in the 3 Flash preview). For the travel agent above, low is plenty and noticeably snappier. If your agent is calling tools more than it should, drop the thinking level first — it's the cheapest fix. If that isn't enough, add a system instruction such as: You have a budget of 4 tool calls. Use them efficiently.
Common pitfalls
- Forgetting the call
id. Everyfunction_responsemust include the exactidfrom itsfunction_call. Omit it and the model usually returns an empty response withfinish_reason: STOP— no error, just silence. This is the #1 migration trap from 2.5. - Mismatched response counts. If the model issues three calls, return three responses. One per call, names matched. A missing or extra response breaks the turn.
- Editing the model's turn out of history. Append
response.candidates[0].contentas-is. Stripping it (or rebuilding it by hand) drops the thought signatures and the model loses its reasoning thread on long tasks. - Setting temperature/top_p/top_k. Remove them for all Gemini 3.x models. They're tuned for defaults; overriding degrades reasoning and tool selection.
- Putting images outside the function response. If a tool returns an image, include it inside the function-response parts, not as a separate message — otherwise you can get thought leakage and lower-quality output.
- No loop cap. Always bound the loop with
max_steps. A confused model (or a tool that always errors) can otherwise call tools forever and run up your bill. - Expecting Computer Use. Gemini 3.5 Flash does not support the Computer Use tool yet. For that workload, stay on Gemini 3 Flash Preview.
Quick reference
| Item | Value |
|---|---|
| Install | pip install -U google-genai |
| Import | from google import genai / from google.genai import types |
| Model ID | gemini-3.5-flash (GA, no preview suffix) |
| Context / output | 1M input tokens / 65k max output tokens |
| Register tools | types.Tool(function_declarations=[...]) -> config.tools |
| Read calls | response.function_calls (list of name/args/id) |
| Return a result | types.Part.from_function_response(name=, response=, id=) |
| Reasoning control | thinking_config=ThinkingConfig(thinking_level='low') |
| Don't set | temperature, top_p, top_k, thinking_budget |
| Not supported | Computer Use (use Gemini 3 Flash Preview) |
Next steps
- Swap the stub tools for real ones — a database query, an HTTP call, a file write. The loop doesn't change.
- Add a system instruction to give the agent a persona and a tool-call budget.
- Combine custom functions with built-in tools (Google Search, code execution, URL context) in the same request — Gemini 3.x supports mixing them.
- Graduate to the new Interactions API, which Google recommends for agentic and background workloads and preserves thoughts automatically.
- For multi-agent systems, point sub-agents at
gemini-3.5-flashwiththinking_level='low'for cheap, fast specialists.
You now have a real tool-using agent on the newest Flash model. The pattern — declare, detect, execute, return, repeat — is the same one every agent framework wraps. Knowing it from the metal means you can debug any of them.
Comments
Be the first to comment