Antigravity Agent + Vision: Build a Chart-to-PDF Pipeline — ContentBuffer guide

Antigravity Agent + Vision: Build a Chart-to-PDF Pipeline

K
Kodetra Technologies··11 min read Intermediate

Summary

Hand Gemini's new managed agent a screenshot. Get back a sandboxed PDF report — code, charts, files included.

Why this guide, right now

On May 19, 2026 Google flipped the switch on Managed Agents in the Gemini API. The headline product is the Antigravity Agent — a general-purpose, sandboxed Linux agent you can spin up in one API call. It runs the same harness as the Antigravity IDE, it's powered by Gemini 3.5 Flash, and it accepts multimodal input: text and images, today.

That last detail is the part nobody is showing you yet. You can hand the agent a chart screenshot, let it transcribe the data, write its own Python in a sandbox you never have to maintain, and walk away with a finished PDF report. This guide walks the entire pipeline end to end, with code you can paste into a file and run.

By the end you will: make your first Antigravity call, send an image alongside instructions, persist a real sandbox across multi-turn calls, stream the agent's intermediate steps, and download the artifacts it builds. We'll close with a reusable custom agent and the gotchas that will save you a few hundred wasted tokens.


Prerequisites

  • Python 3.10 or newer.
  • A Gemini API key from AI Studio. The Antigravity agent is in preview and billed pay-as-you-go on Gemini 3.5 Flash tokens.
  • google-genai SDK >= 1.0 (it exposes client.interactions).
  • About 5 minutes and one chart image you'd like analyzed. A PNG screenshot of any line/bar chart works.
# Python 3.10+ recommended
pip install --upgrade google-genai pillow requests

# Get an API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="ya29.your-key-here"

Step 1 — Make your first agent call

Before we send images, make sure the basics work. A single call to client.interactions.create provisions a Linux sandbox, runs the agent loop, and returns the result. Three parameters do all the work:

  • agent="antigravity-preview-05-2026" — the current preview model id.
  • environment="remote" — give me a brand-new sandbox.
  • input= — what you want the agent to do, as text or a list of typed parts.
# antigravity_quickstart.py
from google import genai

client = genai.Client()  # picks up GEMINI_API_KEY

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input=(
        "Write a Python script that generates the first 20 Fibonacci "
        "numbers and saves them to fibonacci.txt. Then read the file "
        "and print its contents."
    ),
    environment="remote",
)

print("Interaction ID :", interaction.id)
print("Environment ID :", interaction.environment_id)
print("Output:\n", interaction.output_text)

Run it. After a few seconds you'll see something like this:

Interaction ID : int_2cAa9d3fLp...
Environment ID : env_8b9e2c14...
Output:
 I created `fibonacci.py`, ran it, and saved the first 20 numbers to
 `fibonacci.txt`. The file contents are:
 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987,
 1597, 2584, 4181

Two pieces of state matter for everything that follows: interaction.id identifies the conversation, and interaction.environment_id identifies the sandbox. Hold onto both. interaction.steps contains every reasoning step, tool call, and code run if you want to audit what the agent did.


Step 2 — Send an image alongside instructions

The Antigravity agent accepts text and image input parts. Images must be supplied inline as base64 strings — there is no remote URL form during preview, and audio/video/document inputs are not supported yet. For our pipeline we'll combine a precise text prompt with a PNG of a chart we want analyzed.

Save this prompt to prompt.txt. It's the spec the agent will follow:

SYSTEM TASK: chart-to-report pipeline
========================================
You will receive a single chart image (PNG).

1. Look at the chart. Identify the X axis, Y axis, units, and title.
2. Estimate each data point as accurately as you can and list them as
   a Python list of (label, value) tuples. Be honest about uncertainty.
3. Save the points to /tmp/data.csv (columns: label,value).
4. Write a Python script /tmp/render.py that:
   - reads data.csv with the standard csv module (no pandas required),
   - replots the series cleanly with matplotlib at 1600x900,
   - saves the chart to /tmp/clean_chart.png,
   - computes a quick stat summary (count, mean, min, max, last vs first
     change, and 3-month rolling average if the series has >=6 points),
   - writes a 1-page PDF report /tmp/report.pdf with the title, your
     transcription notes, the cleaned chart, and the stats table.
   Use reportlab for the PDF.
5. Run /tmp/render.py end to end. Show me any errors and fix them
   until report.pdf exists.
6. Print the final file size of report.pdf and a 5-line summary.

Then send it together with your chart image. Drop any line or bar chart PNG next to your script and name it sales_chart.png.

# chart_to_pdf.py
import base64, os, requests, tarfile
from google import genai

client = genai.Client()

with open("sales_chart.png", "rb") as f:
    image_bytes = f.read()

with open("prompt.txt", "r") as f:
    instructions = f.read()

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input=[
        {"type": "text", "text": instructions},
        {
            "type": "image",
            "data": base64.b64encode(image_bytes).decode("utf-8"),
            "mime_type": "image/png",
        },
    ],
    environment="remote",
)

print("Interaction:", interaction.id)
print("Environment:", interaction.environment_id)
print()
print(interaction.output_text)

A few things to notice in interaction.output_text: the agent narrates its plan, runs Bash and Python in the sandbox, fixes its own errors, and reports the final file size. If you peek at interaction.steps, you'll see entries for code_execution calls (the matplotlib runs), filesystem writes for data.csv, render.py, clean_chart.png, and report.pdf, and a final natural-language wrap-up.


Step 3 — Download the artifacts the agent built

The PDF lives inside the sandbox, not on your machine. To pull it out you call the Files API, which returns the whole environment as a tar archive. This is the under-documented part — the SDK doesn't wrap it yet, so you go through requests directly.

# pull_files.py — download the whole sandbox as a tar snapshot
import os, requests, tarfile

env_id = "env_8b9e2c14..."  # interaction.environment_id from the previous call
api_key = os.environ["GEMINI_API_KEY"]

url = (
    "https://generativelanguage.googleapis.com/v1beta/files/"
    f"environment-{env_id}:download"
)

resp = requests.get(
    url,
    params={"alt": "media"},
    headers={"x-goog-api-key": api_key},
    allow_redirects=True,
    timeout=120,
)
resp.raise_for_status()

with open("snapshot.tar", "wb") as f:
    f.write(resp.content)

with tarfile.open("snapshot.tar") as tar:
    tar.extractall(path="extracted")

# Find the artifacts we asked the agent to create
for root, _, files in os.walk("extracted"):
    for name in files:
        if name in {"data.csv", "clean_chart.png", "report.pdf", "render.py"}:
            print(os.path.relpath(os.path.join(root, name), "extracted"))

You should see the four files we asked for:

tmp/data.csv
tmp/render.py
tmp/clean_chart.png
tmp/report.pdf

Open extracted/tmp/report.pdf in any viewer. You now have a self-contained PDF generated by a model that started with nothing but an image and an English prompt. No matplotlib install on your laptop, no reportlab, no leftover venv.


Step 4 — Multi-turn: keep the conversation OR the workspace

Here is where Managed Agents get powerful and where most quickstarts stop. The API tracks two independent dimensions of state: the conversation (chat history, reasoning trace, tool use) and the environment (files, installed packages, sandbox state). You can mix and match them per call.

Goalprevious_interaction_idenvironment
Continue the chat and the workspaceinteraction.idinteraction.environment_id
Fresh chat, same files(omit)interaction.environment_id
Same chat, new sandboxinteraction.id"remote"
Start over completely(omit)"remote"
# Continue the same conversation AND the same sandbox.
followup = client.interactions.create(
    agent="antigravity-preview-05-2026",
    previous_interaction_id=interaction.id,        # keep chat history
    environment=interaction.environment_id,        # keep files (data.csv, etc.)
    input=(
        "Now produce /tmp/exec_summary.md with three bullet points the "
        "CEO should care about, using the numbers in data.csv only. "
        "No new chart. Keep it under 120 words."
    ),
)
print(followup.output_text)

# Variant 1 — clear chat but keep the workspace:
fresh_chat = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment=interaction.environment_id,
    input="List every file under /tmp and how large it is.",
)

# Variant 2 — keep chat but get a fresh sandbox:
new_sandbox = client.interactions.create(
    agent="antigravity-preview-05-2026",
    previous_interaction_id=interaction.id,
    environment="remote",
    input="Recreate /tmp/data.csv from memory and verify your numbers.",
)

This is what unlocks a real pipeline. The expensive step — vision transcription plus matplotlib/reportlab installs in the sandbox — runs once. Every follow-up turn reuses the files and skips the setup. For long sessions, the agent also runs automatic context compaction around ~135k tokens, so the chat history won't blow up your token budget.


Step 5 — Stream the steps in real time

A vision + matplotlib + reportlab run can take 30-90 seconds. Watching it as a single blocking call is no fun. Set stream=True and you get an iterable of step deltas: reasoning chunks, tool calls, code stdout/stderr, and the final answer.

# stream_run.py — watch the agent think, in real time
from google import genai

client = genai.Client()

stream = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input=(
        "Read Hacker News, summarize the top 5 stories about AI agents, "
        "and save the summary as /tmp/hn.pdf."
    ),
    environment="remote",
    stream=True,
)

for event in stream:
    # Each event is a step delta: reasoning text, tool call, code I/O,
    # or final output. Print whatever it carries.
    kind = getattr(event, "type", "delta")
    text = getattr(event, "delta", None) or getattr(event, "text", "")
    if text:
        print(f"[{kind}] {text}", flush=True)

You'll see lines like [reasoning] I'll start by reading the image…, then [tool_call] code_execution, then the raw stdout from the agent's Python, then the natural-language summary. Streaming is also how you safely cancel a runaway agent — tear down the iterator and the server stops billing tokens.


Step 6 — Restrict the toolset

By default the agent gets code_execution, google_search, and url_context. Filesystem access turns on automatically when you set the environment parameter. If your task doesn't need the sandbox, or you want to forbid web reads, pass an explicit allowlist:

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Pull the latest CVE notes and write a 1-page briefing.",
    environment="remote",
    tools=[
        {"type": "google_search"},
        {"type": "url_context"},
    ],
)

Smaller toolsets mean fewer surprises, fewer wasted tool calls, and lower bills. They also disable the agent's escape hatches — if the task genuinely requires Python and you don't allow it, the agent will tell you it can't continue rather than improvising.


Step 7 — Save it as a reusable custom agent

Copy-pasting the same system prompt into every call gets old. The Agents API lets you bake instructions and tools into a named agent you invoke by id. Each invocation forks the base environment, so every run starts from a clean copy of whatever you configured.

# Define a reusable, named agent. Invoke later by id.
agent = client.agents.create(
    id="chart-to-pdf-analyst",
    base_agent="antigravity-preview-05-2026",
    system_instruction=(
        "You are a careful chart-to-report analyst. Always extract data "
        "to CSV first, replot cleanly, and produce a 1-page PDF with a "
        "stats table. Cite any transcription uncertainty."
    ),
    base_environment={
        "type": "remote",
        "sources": [
            {
                "type": "inline",
                "target": ".agents/AGENTS.md",
                "content": (
                    "Always include: (1) cleaned chart, "
                    "(2) summary stats table, (3) transcription notes."
                ),
            },
        ],
    },
)

# Use it
result = client.interactions.create(
    agent="chart-to-pdf-analyst",
    input=[
        {"type": "text", "text": "Build the report from this image."},
        {"type": "image", "data": base64.b64encode(image_bytes).decode("utf-8"),
         "mime_type": "image/png"},
    ],
    environment="remote",
)
print(result.output_text)

Drop SKILL.md files under .agents/skills/ (inline or from a Git repo or GCS) and the agent will pick them up as named capabilities. This is the same skills format Antigravity IDE uses, which is convenient if you've already invested in that muscle.


Worked example: a realistic sales-chart pipeline

Here is the kind of run you can expect on a real chart. The input was a PNG screenshot of a quarterly revenue line chart pulled from a blog post. The text prompt was the one from Step 2. The agent's narrated output included:

  • Identified X axis as quarter (Q1 2023 → Q1 2026) and Y axis as revenue ($M).
  • Transcribed 13 data points with explicit uncertainty notes on Q3 2024 ("label partially occluded").
  • Wrote data.csv, render.py, and ran it.
  • Recovered from a missing reportlab on first try — installed it inline, re-ran, and produced report.pdf.
  • Final summary: "Report saved to /tmp/report.pdf (87 KB). Revenue grew 3.4x from Q1 2023 to Q1 2026; the strongest quarter-over-quarter jump was Q2 2024 at +28%."

Total wall-clock: 41 seconds. Total cost: about $0.42 in tokens. Zero environment compute was billed during the preview window.


Common pitfalls (the ones that bit me)

  • Don't try to set temperature, top_p, top_k, stop_sequences, or max_output_tokens. The Antigravity agent rejects them with a 400. The agent decides its own decoding settings — your only knob is the prompt.
  • Structured output isn't supported. If you need JSON, instruct the agent to write JSON to a file and read it back yourself; don't try to wire response_schemas through Interactions.
  • No background=True. You must set store=True (the default). For very long runs use streaming and your own task queue.
  • Image inputs only. Audio, video, and PDF inputs are not accepted yet. If you need PDF analysis, upload it as text via the Files API in a previous step, or have the agent fetch a URL with url_context.
  • Filesystem is enabled by the environment param, not a tool entry. If you pass a tools=[...] allowlist and forget to set environment, the agent silently has no disk and will fail any task that needs files.
  • Tokens add up fast. A single research run can hit 3-5M tokens. Stream, set hard time budgets in your client, and cancel aggressively. 50-70% of input is cached in practice, but the first run pays full price.
  • Watch your environment_id. Reusing it preserves files and installed packages, which is great for speed but can leak state between unrelated jobs. For tenant isolation, mint a fresh remote per customer.
  • Preview means breaking changes. The Interactions API already shipped one breaking change in May 2026 (see Google's migration notes). Pin the SDK version and read the changelog before every upgrade.

Quick reference: the Antigravity agent in one table

ConceptWhat it isHow you set it
Agent idWhich managed agent runsagent="antigravity-preview-05-2026" or your custom agent id
EnvironmentLinux sandbox lifecycleenvironment="remote" | env_id | EnvironmentConfig
ConversationMulti-turn chat historyprevious_interaction_id=interaction.id
ToolsAllowed capabilitiestools=[{type: "code_execution"}, ...] (defaults to all 3)
StreamingLive step deltasstream=True, then iterate
Files inImage part inline{type: "image", data: , mime_type: "image/png"}
Files outSandbox snapshot tarGET /v1beta/files/environment-:download?alt=media
Cost driverUnderlying tokensGemini 3.5 Flash pricing; compute is free during preview

Where to take this next

  • Wire the download step into a webhook so a finished PDF lands in S3 or Drive automatically.
  • Swap the inline image input for an image generated earlier in your app and pipe both through one custom agent.
  • Layer in SKILL.md files under .agents/skills/ for repeatable patterns: quarterly-report, kpi-deck, anomaly-explainer.
  • Pair Antigravity Agent with a smaller Gemini Flash call up front to decide whether to invoke the agent at all — cheap routing in front of an expensive worker.
  • Read the official Antigravity Agent docs and the Managed Agents quickstart for the latest field reference, including environment sources and per-interaction overrides.

The interesting shift here isn't "another agent framework." It's that the sandbox, the model, the tools, and the file system arrive as one billable primitive. You stop thinking about LangGraph nodes or scaling worker containers; you think about what artifact you want back. The chart-to-PDF flow above is small, but the shape — image in, verified file out — is the shape a lot of production work is about to take.

Comments

Subscribe to join the conversation...

Be the first to comment

Found this useful?

Get new AI guides for builders by email. Free.