
Antigravity Agent + Vision: Build a Chart-to-PDF Pipeline
Summary
Hand Gemini's new managed agent a screenshot. Get back a sandboxed PDF report — code, charts, files included.
Why this guide, right now
On May 19, 2026 Google flipped the switch on Managed Agents in the Gemini API. The headline product is the Antigravity Agent — a general-purpose, sandboxed Linux agent you can spin up in one API call. It runs the same harness as the Antigravity IDE, it's powered by Gemini 3.5 Flash, and it accepts multimodal input: text and images, today.
That last detail is the part nobody is showing you yet. You can hand the agent a chart screenshot, let it transcribe the data, write its own Python in a sandbox you never have to maintain, and walk away with a finished PDF report. This guide walks the entire pipeline end to end, with code you can paste into a file and run.
By the end you will: make your first Antigravity call, send an image alongside instructions, persist a real sandbox across multi-turn calls, stream the agent's intermediate steps, and download the artifacts it builds. We'll close with a reusable custom agent and the gotchas that will save you a few hundred wasted tokens.
Prerequisites
- Python 3.10 or newer.
- A Gemini API key from AI Studio. The Antigravity agent is in preview and billed pay-as-you-go on Gemini 3.5 Flash tokens.
google-genaiSDK >= 1.0 (it exposesclient.interactions).- About 5 minutes and one chart image you'd like analyzed. A PNG screenshot of any line/bar chart works.
# Python 3.10+ recommended
pip install --upgrade google-genai pillow requests
# Get an API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="ya29.your-key-here"
Step 1 — Make your first agent call
Before we send images, make sure the basics work. A single call to client.interactions.create provisions a Linux sandbox, runs the agent loop, and returns the result. Three parameters do all the work:
agent="antigravity-preview-05-2026"— the current preview model id.environment="remote"— give me a brand-new sandbox.input=— what you want the agent to do, as text or a list of typed parts.
# antigravity_quickstart.py
from google import genai
client = genai.Client() # picks up GEMINI_API_KEY
interaction = client.interactions.create(
agent="antigravity-preview-05-2026",
input=(
"Write a Python script that generates the first 20 Fibonacci "
"numbers and saves them to fibonacci.txt. Then read the file "
"and print its contents."
),
environment="remote",
)
print("Interaction ID :", interaction.id)
print("Environment ID :", interaction.environment_id)
print("Output:\n", interaction.output_text)
Run it. After a few seconds you'll see something like this:
Interaction ID : int_2cAa9d3fLp...
Environment ID : env_8b9e2c14...
Output:
I created `fibonacci.py`, ran it, and saved the first 20 numbers to
`fibonacci.txt`. The file contents are:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987,
1597, 2584, 4181
Two pieces of state matter for everything that follows: interaction.id identifies the conversation, and interaction.environment_id identifies the sandbox. Hold onto both. interaction.steps contains every reasoning step, tool call, and code run if you want to audit what the agent did.
Step 2 — Send an image alongside instructions
The Antigravity agent accepts text and image input parts. Images must be supplied inline as base64 strings — there is no remote URL form during preview, and audio/video/document inputs are not supported yet. For our pipeline we'll combine a precise text prompt with a PNG of a chart we want analyzed.
Save this prompt to prompt.txt. It's the spec the agent will follow:
SYSTEM TASK: chart-to-report pipeline
========================================
You will receive a single chart image (PNG).
1. Look at the chart. Identify the X axis, Y axis, units, and title.
2. Estimate each data point as accurately as you can and list them as
a Python list of (label, value) tuples. Be honest about uncertainty.
3. Save the points to /tmp/data.csv (columns: label,value).
4. Write a Python script /tmp/render.py that:
- reads data.csv with the standard csv module (no pandas required),
- replots the series cleanly with matplotlib at 1600x900,
- saves the chart to /tmp/clean_chart.png,
- computes a quick stat summary (count, mean, min, max, last vs first
change, and 3-month rolling average if the series has >=6 points),
- writes a 1-page PDF report /tmp/report.pdf with the title, your
transcription notes, the cleaned chart, and the stats table.
Use reportlab for the PDF.
5. Run /tmp/render.py end to end. Show me any errors and fix them
until report.pdf exists.
6. Print the final file size of report.pdf and a 5-line summary.
Then send it together with your chart image. Drop any line or bar chart PNG next to your script and name it sales_chart.png.
# chart_to_pdf.py
import base64, os, requests, tarfile
from google import genai
client = genai.Client()
with open("sales_chart.png", "rb") as f:
image_bytes = f.read()
with open("prompt.txt", "r") as f:
instructions = f.read()
interaction = client.interactions.create(
agent="antigravity-preview-05-2026",
input=[
{"type": "text", "text": instructions},
{
"type": "image",
"data": base64.b64encode(image_bytes).decode("utf-8"),
"mime_type": "image/png",
},
],
environment="remote",
)
print("Interaction:", interaction.id)
print("Environment:", interaction.environment_id)
print()
print(interaction.output_text)
A few things to notice in interaction.output_text: the agent narrates its plan, runs Bash and Python in the sandbox, fixes its own errors, and reports the final file size. If you peek at interaction.steps, you'll see entries for code_execution calls (the matplotlib runs), filesystem writes for data.csv, render.py, clean_chart.png, and report.pdf, and a final natural-language wrap-up.
Step 3 — Download the artifacts the agent built
The PDF lives inside the sandbox, not on your machine. To pull it out you call the Files API, which returns the whole environment as a tar archive. This is the under-documented part — the SDK doesn't wrap it yet, so you go through requests directly.
# pull_files.py — download the whole sandbox as a tar snapshot
import os, requests, tarfile
env_id = "env_8b9e2c14..." # interaction.environment_id from the previous call
api_key = os.environ["GEMINI_API_KEY"]
url = (
"https://generativelanguage.googleapis.com/v1beta/files/"
f"environment-{env_id}:download"
)
resp = requests.get(
url,
params={"alt": "media"},
headers={"x-goog-api-key": api_key},
allow_redirects=True,
timeout=120,
)
resp.raise_for_status()
with open("snapshot.tar", "wb") as f:
f.write(resp.content)
with tarfile.open("snapshot.tar") as tar:
tar.extractall(path="extracted")
# Find the artifacts we asked the agent to create
for root, _, files in os.walk("extracted"):
for name in files:
if name in {"data.csv", "clean_chart.png", "report.pdf", "render.py"}:
print(os.path.relpath(os.path.join(root, name), "extracted"))
You should see the four files we asked for:
tmp/data.csv
tmp/render.py
tmp/clean_chart.png
tmp/report.pdf
Open extracted/tmp/report.pdf in any viewer. You now have a self-contained PDF generated by a model that started with nothing but an image and an English prompt. No matplotlib install on your laptop, no reportlab, no leftover venv.
Step 4 — Multi-turn: keep the conversation OR the workspace
Here is where Managed Agents get powerful and where most quickstarts stop. The API tracks two independent dimensions of state: the conversation (chat history, reasoning trace, tool use) and the environment (files, installed packages, sandbox state). You can mix and match them per call.
| Goal | previous_interaction_id | environment |
|---|---|---|
| Continue the chat and the workspace | interaction.id | interaction.environment_id |
| Fresh chat, same files | (omit) | interaction.environment_id |
| Same chat, new sandbox | interaction.id | "remote" |
| Start over completely | (omit) | "remote" |
# Continue the same conversation AND the same sandbox.
followup = client.interactions.create(
agent="antigravity-preview-05-2026",
previous_interaction_id=interaction.id, # keep chat history
environment=interaction.environment_id, # keep files (data.csv, etc.)
input=(
"Now produce /tmp/exec_summary.md with three bullet points the "
"CEO should care about, using the numbers in data.csv only. "
"No new chart. Keep it under 120 words."
),
)
print(followup.output_text)
# Variant 1 — clear chat but keep the workspace:
fresh_chat = client.interactions.create(
agent="antigravity-preview-05-2026",
environment=interaction.environment_id,
input="List every file under /tmp and how large it is.",
)
# Variant 2 — keep chat but get a fresh sandbox:
new_sandbox = client.interactions.create(
agent="antigravity-preview-05-2026",
previous_interaction_id=interaction.id,
environment="remote",
input="Recreate /tmp/data.csv from memory and verify your numbers.",
)
This is what unlocks a real pipeline. The expensive step — vision transcription plus matplotlib/reportlab installs in the sandbox — runs once. Every follow-up turn reuses the files and skips the setup. For long sessions, the agent also runs automatic context compaction around ~135k tokens, so the chat history won't blow up your token budget.
Step 5 — Stream the steps in real time
A vision + matplotlib + reportlab run can take 30-90 seconds. Watching it as a single blocking call is no fun. Set stream=True and you get an iterable of step deltas: reasoning chunks, tool calls, code stdout/stderr, and the final answer.
# stream_run.py — watch the agent think, in real time
from google import genai
client = genai.Client()
stream = client.interactions.create(
agent="antigravity-preview-05-2026",
input=(
"Read Hacker News, summarize the top 5 stories about AI agents, "
"and save the summary as /tmp/hn.pdf."
),
environment="remote",
stream=True,
)
for event in stream:
# Each event is a step delta: reasoning text, tool call, code I/O,
# or final output. Print whatever it carries.
kind = getattr(event, "type", "delta")
text = getattr(event, "delta", None) or getattr(event, "text", "")
if text:
print(f"[{kind}] {text}", flush=True)
You'll see lines like [reasoning] I'll start by reading the image…, then [tool_call] code_execution, then the raw stdout from the agent's Python, then the natural-language summary. Streaming is also how you safely cancel a runaway agent — tear down the iterator and the server stops billing tokens.
Step 6 — Restrict the toolset
By default the agent gets code_execution, google_search, and url_context. Filesystem access turns on automatically when you set the environment parameter. If your task doesn't need the sandbox, or you want to forbid web reads, pass an explicit allowlist:
interaction = client.interactions.create(
agent="antigravity-preview-05-2026",
input="Pull the latest CVE notes and write a 1-page briefing.",
environment="remote",
tools=[
{"type": "google_search"},
{"type": "url_context"},
],
)
Smaller toolsets mean fewer surprises, fewer wasted tool calls, and lower bills. They also disable the agent's escape hatches — if the task genuinely requires Python and you don't allow it, the agent will tell you it can't continue rather than improvising.
Step 7 — Save it as a reusable custom agent
Copy-pasting the same system prompt into every call gets old. The Agents API lets you bake instructions and tools into a named agent you invoke by id. Each invocation forks the base environment, so every run starts from a clean copy of whatever you configured.
# Define a reusable, named agent. Invoke later by id.
agent = client.agents.create(
id="chart-to-pdf-analyst",
base_agent="antigravity-preview-05-2026",
system_instruction=(
"You are a careful chart-to-report analyst. Always extract data "
"to CSV first, replot cleanly, and produce a 1-page PDF with a "
"stats table. Cite any transcription uncertainty."
),
base_environment={
"type": "remote",
"sources": [
{
"type": "inline",
"target": ".agents/AGENTS.md",
"content": (
"Always include: (1) cleaned chart, "
"(2) summary stats table, (3) transcription notes."
),
},
],
},
)
# Use it
result = client.interactions.create(
agent="chart-to-pdf-analyst",
input=[
{"type": "text", "text": "Build the report from this image."},
{"type": "image", "data": base64.b64encode(image_bytes).decode("utf-8"),
"mime_type": "image/png"},
],
environment="remote",
)
print(result.output_text)
Drop SKILL.md files under .agents/skills/ (inline or from a Git repo or GCS) and the agent will pick them up as named capabilities. This is the same skills format Antigravity IDE uses, which is convenient if you've already invested in that muscle.
Worked example: a realistic sales-chart pipeline
Here is the kind of run you can expect on a real chart. The input was a PNG screenshot of a quarterly revenue line chart pulled from a blog post. The text prompt was the one from Step 2. The agent's narrated output included:
- Identified X axis as quarter (Q1 2023 → Q1 2026) and Y axis as revenue ($M).
- Transcribed 13 data points with explicit uncertainty notes on Q3 2024 ("label partially occluded").
- Wrote
data.csv,render.py, and ran it. - Recovered from a missing
reportlabon first try — installed it inline, re-ran, and producedreport.pdf. - Final summary: "Report saved to /tmp/report.pdf (87 KB). Revenue grew 3.4x from Q1 2023 to Q1 2026; the strongest quarter-over-quarter jump was Q2 2024 at +28%."
Total wall-clock: 41 seconds. Total cost: about $0.42 in tokens. Zero environment compute was billed during the preview window.
Common pitfalls (the ones that bit me)
- Don't try to set
temperature,top_p,top_k,stop_sequences, ormax_output_tokens. The Antigravity agent rejects them with a 400. The agent decides its own decoding settings — your only knob is the prompt. - Structured output isn't supported. If you need JSON, instruct the agent to write JSON to a file and read it back yourself; don't try to wire response_schemas through Interactions.
- No
background=True. You must setstore=True(the default). For very long runs use streaming and your own task queue. - Image inputs only. Audio, video, and PDF inputs are not accepted yet. If you need PDF analysis, upload it as text via the Files API in a previous step, or have the agent fetch a URL with
url_context. - Filesystem is enabled by the
environmentparam, not a tool entry. If you pass atools=[...]allowlist and forget to setenvironment, the agent silently has no disk and will fail any task that needs files. - Tokens add up fast. A single research run can hit 3-5M tokens. Stream, set hard time budgets in your client, and cancel aggressively. 50-70% of input is cached in practice, but the first run pays full price.
- Watch your
environment_id. Reusing it preserves files and installed packages, which is great for speed but can leak state between unrelated jobs. For tenant isolation, mint a fresh remote per customer. - Preview means breaking changes. The Interactions API already shipped one breaking change in May 2026 (see Google's migration notes). Pin the SDK version and read the changelog before every upgrade.
Quick reference: the Antigravity agent in one table
| Concept | What it is | How you set it |
|---|---|---|
| Agent id | Which managed agent runs | agent="antigravity-preview-05-2026" or your custom agent id |
| Environment | Linux sandbox lifecycle | environment="remote" | env_id | EnvironmentConfig |
| Conversation | Multi-turn chat history | previous_interaction_id=interaction.id |
| Tools | Allowed capabilities | tools=[{type: "code_execution"}, ...] (defaults to all 3) |
| Streaming | Live step deltas | stream=True, then iterate |
| Files in | Image part inline | {type: "image", data: |
| Files out | Sandbox snapshot tar | GET /v1beta/files/environment- |
| Cost driver | Underlying tokens | Gemini 3.5 Flash pricing; compute is free during preview |
Where to take this next
- Wire the download step into a webhook so a finished PDF lands in S3 or Drive automatically.
- Swap the inline image input for an image generated earlier in your app and pipe both through one custom agent.
- Layer in
SKILL.mdfiles under.agents/skills/for repeatable patterns: quarterly-report, kpi-deck, anomaly-explainer. - Pair Antigravity Agent with a smaller Gemini Flash call up front to decide whether to invoke the agent at all — cheap routing in front of an expensive worker.
- Read the official Antigravity Agent docs and the Managed Agents quickstart for the latest field reference, including environment sources and per-interaction overrides.
The interesting shift here isn't "another agent framework." It's that the sandbox, the model, the tools, and the file system arrive as one billable primitive. You stop thinking about LangGraph nodes or scaling worker containers; you think about what artifact you want back. The chart-to-PDF flow above is small, but the shape — image in, verified file out — is the shape a lot of production work is about to take.
Comments
Be the first to comment
Found this useful?
Get new AI guides for builders by email. Free.