Gemini 3.5 Flash Function Calling: Build a Tool-Using Agent — ContentBuffer guide

Gemini 3.5 Flash Function Calling: Build a Tool-Using Agent

K
Kodetra Technologies··8 min read Intermediate

Summary

Wire Gemini 3.5 Flash to your own Python functions and run a real multi-step agent loop.

On May 19, 2026 at Google I/O, Google shipped Gemini 3.5 Flash straight to general availability — no preview tag. The headline number was speed (~4x the output tokens per second of comparable frontier models), but the more interesting line in the docs was the positioning: this is a model built for the agentic era, tuned for sub-agent deployment, rapid tool-use loops, and long-horizon multi-step work.

A fast model is only useful as an agent if it can do things — call your code, read the result, and decide what to do next. That capability is function calling. In this guide you'll wire Gemini 3.5 Flash to two real Python functions and build a working agent loop that can chain multiple tool calls in a single turn, feed results back, and produce a grounded final answer.

Everything here is verified against the official Gemini API docs for the 3.5 release. By the end you'll have a ~70-line agent you can drop your own tools into.

Prerequisites

  • Python 3.9+ and pip.
  • A Gemini API key from Google AI Studio (the free tier is enough to follow along).
  • The official SDK: pip install -U google-genai (note the dash; the package is google-genai, you import it as google.genai).
  • Basic comfort with Python functions and dictionaries. No ML background needed.
pip install -U google-genai
export GEMINI_API_KEY="your_key_here"

The SDK reads GEMINI_API_KEY from the environment automatically, so genai.Client() takes no arguments.

Step 1 — A baseline call (no tools yet)

Start with the simplest possible request so you know the install and key work. The model ID is gemini-3.5-flash.

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="In one sentence, what makes a model good at tool use?",
)
print(response.text)

Example output:

A model is good at tool use when it reliably decides *which* function to\ncall, fills the arguments correctly from context, and knows when to stop\ncalling tools and answer.

One thing to internalize early: for all Gemini 3.x models, Google now recommends removing temperature, top_p, and top_k. The reasoning stack is tuned for the defaults, and overriding them tends to hurt. If you need determinism, write an explicit system instruction instead of dropping the temperature.

Step 2 — How function calling actually works

Function calling does not mean the model runs your code. The API is stateless and the model never executes anything. Instead the cycle looks like this:

  1. You send the user's message plus a list of function declarations (name, description, JSON-schema parameters).
  2. If the model decides a tool is needed, it replies with one or more function_call parts containing the function name and arguments — not a text answer.
  3. Your code executes the matching Python function and captures the result.
  4. You send the result back as a function_response, and the model produces its next step: another tool call, or a final text answer.

That last loop — call, execute, return, repeat — is the whole game. A single user request often triggers several rounds before the model is ready to answer.

Step 3 — Declare your tools

A function declaration is a plain dict using a subset of OpenAPI schema. Give each function a descriptive name (no spaces), a description the model can reason about, and typed parameters. The description is doing real work — it's how the model decides when to reach for the tool, so be specific and add example values.

from google.genai import types

# The real implementations (stubbed here with canned data).
WEATHER_DB = {
    "Tokyo":  {"forecast": "rain, 14-18C", "rain_chance": 80},
    "Lisbon": {"forecast": "sunny, 22-27C", "rain_chance": 5},
}
RATES = {"USD": 1.0, "JPY": 156.3, "EUR": 0.92}

def get_weather(city: str) -> dict:
    """Return a short forecast for a city."""
    return WEATHER_DB.get(city, {"forecast": "unknown", "rain_chance": 0})

def convert_currency(amount: float, from_code: str, to_code: str) -> dict:
    """Convert an amount between two ISO currency codes."""
    usd = amount / RATES[from_code]
    return {"converted": round(usd * RATES[to_code], 2), "to": to_code}

TOOL_IMPLS = {"get_weather": get_weather, "convert_currency": convert_currency}

get_weather_decl = {
    "name": "get_weather",
    "description": "Get the current forecast and rain chance for a city.",
    "parameters": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name, e.g. 'Tokyo'."}
        },
        "required": ["city"],
    },
}

convert_currency_decl = {
    "name": "convert_currency",
    "description": "Convert money from one ISO currency code to another.",
    "parameters": {
        "type": "object",
        "properties": {
            "amount":    {"type": "number", "description": "Amount to convert."},
            "from_code": {"type": "string", "description": "Source ISO code, e.g. 'USD'."},
            "to_code":   {"type": "string", "description": "Target ISO code, e.g. 'JPY'."},
        },
        "required": ["amount", "from_code", "to_code"],
    },
}

The TOOL_IMPLS dict is a small but important pattern: it maps the declared name to the real callable so your loop can dispatch by name without a pile of if statements.

Step 4 — Register the tools and detect a call

Wrap the declarations in a types.Tool and pass it through GenerateContentConfig. This is also where you set thinking_level — more on that below. After a response comes back, the convenient response.function_calls accessor returns a list of any calls the model made.

client = genai.Client()
tools = types.Tool(
    function_declarations=[get_weather_decl, convert_currency_decl]
)
config = types.GenerateContentConfig(
    tools=[tools],
    thinking_config=types.ThinkingConfig(thinking_level="low"),
)

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What's the weather in Tokyo right now?",
    config=config,
)

for call in response.function_calls:
    print(call.name, dict(call.args), "id=", call.id)
# -> get_weather {'city': 'Tokyo'} id= a1b2c3...

Notice the id on each call. Hold onto it — Gemini 3.x requires you to echo that exact id back when you return the result, or the model silently returns an empty response. This is the single most common thing people get wrong after migrating from 2.5.

Step 5 — The full agent loop

Now assemble the cycle into a reusable loop. The structure: send the conversation, check for tool calls, execute every call the model requested, append both the model's turn and your results to the running contents list, and repeat until the model answers with text instead of a tool call. A max_steps cap stops runaway loops.

def run_agent(user_message: str, max_steps: int = 6) -> str:
    contents = [types.Content(role="user",
                              parts=[types.Part(text=user_message)])]

    for step in range(max_steps):
        response = client.models.generate_content(
            model="gemini-3.5-flash", contents=contents, config=config
        )

        calls = response.function_calls
        if not calls:                      # model is done -> final answer
            return response.text

        # Keep the model's turn (incl. its thought signatures) in history.
        contents.append(response.candidates[0].content)

        tool_parts = []
        for call in calls:
            impl = TOOL_IMPLS[call.name]   # dispatch by name
            result = impl(**call.args)
            print(f"  [tool] {call.name}({dict(call.args)}) -> {result}")
            tool_parts.append(
                types.Part.from_function_response(
                    name=call.name,        # must match the call
                    response={"result": result},
                    id=call.id,            # must echo the call id
                )
            )
        # Return ALL results in one user turn (one response per call).
        contents.append(types.Content(role="user", parts=tool_parts))

    return "Stopped: hit max_steps without a final answer."

Three details are load-bearing here. First, you append response.candidates[0].content unchanged — that carries the model's internal reasoning context (thought signatures) forward, which Gemini 3.5 preserves automatically across turns to stay coherent on multi-step tasks. Second, every function_response echoes both the name and the id of the call it answers. Third, if the model asks for two tools at once, you must return exactly two responses — one per call, no more, no fewer.

Worked example — a one-shot travel assistant

Here's the payoff. Ask a question that needs both tools, and watch the model fire them in parallel, then synthesize.

print(run_agent(
    "I'm flying to Tokyo on Friday. What should I pack, "
    "and how much is 500 USD in Japanese yen?"
))

Actual run (tool lines are the print inside the loop):

  [tool] get_weather({'city': 'Tokyo'}) -> {'forecast': 'rain, 14-18C', 'rain_chance': 80}
  [tool] convert_currency({'amount': 500, 'from_code': 'USD', 'to_code': 'JPY'}) -> {'converted': 78150.0, 'to': 'JPY'}

Tokyo looks rainy on Friday (14-18C, 80% chance of rain), so pack a
waterproof jacket, an umbrella, and layers for the cool evenings.
500 USD is about 78,150 JPY at the current rate.

The model decided on its own that the request needed two different functions, extracted 500, USD and JPY from plain English, called both in a single turn, and folded the structured results into a natural answer. You wrote the tools; the model wrote the orchestration.

Tuning thinking_level for agents

Gemini 3.5 Flash thinks before it acts, and you control how hard via thinking_level — a string enum that replaces the old numeric thinking_budget (which is no longer recommended). Higher levels make the model explore and verify more, which also means more tool calls. For tight agent loops that's often the wrong trade.

thinking_levelBest for
minimalChat replies, quick facts, trivial single tool calls. Fastest.
lowAgentic loops and code that need few steps. Great default for tool-using agents.
medium (default)Best overall quality; complex code and multi-step agent work.
highHard reasoning, tricky math, the most difficult agent tasks. Most tool calls.

The default is now medium (it was high in the 3 Flash preview). For the travel agent above, low is plenty and noticeably snappier. If your agent is calling tools more than it should, drop the thinking level first — it's the cheapest fix. If that isn't enough, add a system instruction such as: You have a budget of 4 tool calls. Use them efficiently.

Common pitfalls

  • Forgetting the call id. Every function_response must include the exact id from its function_call. Omit it and the model usually returns an empty response with finish_reason: STOP — no error, just silence. This is the #1 migration trap from 2.5.
  • Mismatched response counts. If the model issues three calls, return three responses. One per call, names matched. A missing or extra response breaks the turn.
  • Editing the model's turn out of history. Append response.candidates[0].content as-is. Stripping it (or rebuilding it by hand) drops the thought signatures and the model loses its reasoning thread on long tasks.
  • Setting temperature/top_p/top_k. Remove them for all Gemini 3.x models. They're tuned for defaults; overriding degrades reasoning and tool selection.
  • Putting images outside the function response. If a tool returns an image, include it inside the function-response parts, not as a separate message — otherwise you can get thought leakage and lower-quality output.
  • No loop cap. Always bound the loop with max_steps. A confused model (or a tool that always errors) can otherwise call tools forever and run up your bill.
  • Expecting Computer Use. Gemini 3.5 Flash does not support the Computer Use tool yet. For that workload, stay on Gemini 3 Flash Preview.

Quick reference

ItemValue
Installpip install -U google-genai
Importfrom google import genai / from google.genai import types
Model IDgemini-3.5-flash (GA, no preview suffix)
Context / output1M input tokens / 65k max output tokens
Register toolstypes.Tool(function_declarations=[...]) -> config.tools
Read callsresponse.function_calls (list of name/args/id)
Return a resulttypes.Part.from_function_response(name=, response=, id=)
Reasoning controlthinking_config=ThinkingConfig(thinking_level='low')
Don't settemperature, top_p, top_k, thinking_budget
Not supportedComputer Use (use Gemini 3 Flash Preview)

Next steps

  • Swap the stub tools for real ones — a database query, an HTTP call, a file write. The loop doesn't change.
  • Add a system instruction to give the agent a persona and a tool-call budget.
  • Combine custom functions with built-in tools (Google Search, code execution, URL context) in the same request — Gemini 3.x supports mixing them.
  • Graduate to the new Interactions API, which Google recommends for agentic and background workloads and preserves thoughts automatically.
  • For multi-agent systems, point sub-agents at gemini-3.5-flash with thinking_level='low' for cheap, fast specialists.

You now have a real tool-using agent on the newest Flash model. The pattern — declare, detect, execute, return, repeat — is the same one every agent framework wraps. Knowing it from the metal means you can debug any of them.

Comments

Subscribe to join the conversation...

Be the first to comment