Build Your First Local AI Agent with Google Gemma 4 + Ollama

Build Your First Local AI Agent with Google Gemma 4 + Ollama

K
Kodetra Technologies·April 14, 2026·4 min read Beginner

Summary

Step-by-step guide to run Google's latest Gemma 4 model locally and build an AI agent with tool-calling and agentic workflows.

What You'll Learn

  • Install and run Gemma 4 locally using Ollama
  • Use Gemma 4's built-in function calling for tool use
  • Build a simple AI agent that can search the web and answer questions
  • No cloud API costs — everything runs on your machine

Why Gemma 4?

Google released Gemma 4 in April 2026. It's their most capable open model yet, built specifically for agentic workflows.

Key specs:

  • 128K context window
  • Native function calling and structured JSON output
  • Built-in reasoning mode (think step-by-step before answering)
  • Available in 4 sizes: E2B, E4B, 26B MoE, 31B Dense

The E4B variant runs on most consumer laptops with 8GB+ RAM.


Prerequisites

  • Python 3.10+
  • 8GB+ RAM (for E4B model)
  • macOS, Linux, or Windows with WSL
  • Basic terminal/command line knowledge

Step 1: Install Ollama

Ollama lets you run LLMs locally with one command.

macOS/Linux:

bash

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com/download

Verify installation:

bash

ollama --version

Expected output:

ollama version 0.6.x

Step 2: Pull Gemma 4 Model

bash

ollama pull gemma4:e4b

This downloads the E4B variant (~4GB). For smaller machines, use:

bash

ollama pull gemma4:e2b

Test it works:

bash

ollama run gemma4:e4b "What is an AI agent?"

Expected output:

An AI agent is an autonomous system that can perceive its environment,
make decisions, and take actions to achieve specific goals — often
using tools like web search, code execution, or API calls.

Step 3: Install Python Dependencies

bash

pip install ollama gradio duckduckgo-search

Step 4: Create the AI Agent

Create a file called agent.py:

python

import ollama
import json
from duckduckgo_search import DDGS

# Define the tool (web search)
def web_search(query: str) -> str:
    """Search the web and return top results."""
    results = DDGS().text(query, max_results=3)
    return "\n".join(
        f"- {r['title']}: {r['body']}" for r in results
    )

# Tool definition for Gemma 4
tools = [
    {
        "type": "function",
        "function": {
            "name": "web_search",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

def run_agent(user_question: str) -> str:
    messages = [
        {
            "role": "system",
            "content": "You are a helpful AI agent. Use the web_search tool when you need current information."
        },
        {"role": "user", "content": user_question}
    ]

    # Step 1: Ask Gemma 4 (with tools available)
    response = ollama.chat(
        model="gemma4:e4b",
        messages=messages,
        tools=tools
    )

    msg = response["message"]

    # Step 2: If the model wants to call a tool, execute it
    if msg.get("tool_calls"):
        for tool_call in msg["tool_calls"]:
            func_name = tool_call["function"]["name"]
            func_args = tool_call["function"]["arguments"]

            if func_name == "web_search":
                result = web_search(func_args["query"])

            # Add tool result back to conversation
            messages.append(msg)
            messages.append({
                "role": "tool",
                "content": result
            })

        # Step 3: Get final answer with tool results
        final = ollama.chat(
            model="gemma4:e4b",
            messages=messages
        )
        return final["message"]["content"]

    return msg["content"]

# Test it
if __name__ == "__main__":
    question = "What are the latest AI developments this week?"
    print(f"Q: {question}\n")
    print(f"A: {run_agent(question)}")

Step 5: Run the Agent

bash

python agent.py

Example input:

Q: What are the latest AI developments this week?

Example output:

A: Based on my search, here are this week's top AI developments:
- Google released Gemma 4, their most capable open model for agentic AI
- OpenAI reached 900 million weekly ChatGPT users
- Microsoft expanded multi-model collaboration in Copilot

Step 6: Add a Web UI (Optional)

Add this to the bottom of agent.py:

python

import gradio as gr

demo = gr.Interface(
    fn=run_agent,
    inputs=gr.Textbox(
        label="Ask your AI Agent",
        placeholder="e.g., What's trending in AI today?"
    ),
    outputs=gr.Textbox(label="Agent Response"),
    title="Gemma 4 AI Agent",
    description="Local AI agent powered by Gemma 4 with web search"
)

demo.launch()

Run it:

bash

python agent.py

Open http://localhost:7860 in your browser.


How It Works (Simple Diagram)

User Question
     ↓
Gemma 4 (thinks: do I need a tool?)
     ↓              ↓
  No tool        Calls web_search()
     ↓              ↓
Direct answer   Gets search results
                    ↓
              Gemma 4 summarizes results
                    ↓
              Final answer to user

Add More Tools

Extend your agent by adding more tool functions:

python

# Example: Calculator tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    try:
        return str(eval(expression))
    except:
        return "Error: invalid expression"

# Add to tools list
tools.append({
    "type": "function",
    "function": {
        "name": "calculator",
        "description": "Calculate a math expression",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression like '2+2' or '100*0.15'"
                }
            },
            "required": ["expression"]
        }
    }
})

Troubleshooting

ProblemFix
ollama: command not foundRestart terminal after install
Model too slowUse gemma4:e2b (smaller)
Out of memoryClose other apps, or use E2B
Connection refusedRun ollama serve first

What's Next?

  • Add memory to your agent (store conversation history)
  • Connect to local files (RAG with your documents)
  • Deploy on Android using Google AI Edge SDK
  • Chain multiple agents together (multi-agent systems)

Key Takeaways

  1. Gemma 4 is Google's best open model for building AI agents
  2. Ollama makes local deployment dead simple
  3. Function calling lets your agent use real tools
  4. No API keys or cloud costs required
  5. E4B runs on most laptops with 8GB+ RAM

Comments

Subscribe to join the conversation...

Be the first to comment