
Google Veo 3 Tutorial: AI Video Generation Guide (2026)
Summary
Learn Google Veo 3 from scratch — generate AI videos with text prompts, API code, and pro prompt tips in 2026.
What Is Google Veo 3?
Veo 3 is Google DeepMind's AI video generation model. It turns text descriptions into realistic video clips with native audio generation.
Key specs:
| Feature | Detail |
|---|---|
| Resolution | 720p, 1080p, 4K |
| Duration | Up to 8 seconds |
| Audio | Native — dialogue, SFX, ambient, music |
| Lip sync | Yes, built-in |
| Latest version | Veo 3.1 (March 2026) |
| Fast variant | Veo 3.1 Lite (low-cost, rapid iteration) |
How to Access Veo 3
Option 1: Google Vids (FREE)
Best for: Quick experiments, no coding required.
- Go to Google Vids
- Sign in with any Google account
- Click "Generate video"
- Type your prompt
- Wait 30–90 seconds
- Download your video
No paid subscription needed. Uses Veo 3.1 under the hood.
Option 2: Gemini Advanced (Google AI Ultra)
Best for: Higher quality, longer conversations, integrated workflow.
- Subscribe to Google AI Ultra ($249.99/month)
- Open gemini.google.com
- Type a video prompt in the chat
- Veo 3 generates the video inline
- Download or share directly
Option 3: Gemini API (Developers)
Best for: Automation, apps, batch generation.
- Get API key at Google AI Studio
- Install the SDK
- Call the video generation endpoint
- Poll for completion
- Download the result
Quick Start: Your First Video (API)
Step 1: Install the SDK
Python:
bash
pip install google-genai
JavaScript:
bash
npm install @google/genai
Step 2: Set Your API Key
Python:
python
import os
os.environ["GEMINI_API_KEY"] = "your-api-key-here"
JavaScript:
javascript
const { GoogleGenAI } = require("@google/genai");
const ai = new GoogleGenAI({ apiKey: "your-api-key-here" });
Step 3: Generate a Video
Python — Full Example:
python
from google import genai
from google.genai import types
import time
client = genai.Client(api_key="YOUR_API_KEY")
# Generate video
operation = client.models.generate_videos(
model="veo-3.0-generate-preview",
prompt="A golden retriever running through a sunflower field at sunset. "
"Warm golden light. Slow motion. Shallow depth of field. "
"Sound of birds chirping and gentle wind.",
config=types.GenerateVideosConfig(
number_of_videos=1,
duration_seconds=8,
negative_prompt="blurry, distorted, low quality",
generate_audio=True,
),
)
# Poll until done
while not operation.done:
time.sleep(20)
operation = client.operations.get(operation)
print("Status: generating...")
print("Video ready!")
# Download
for i, video in enumerate(operation.result.generated_videos):
with open(f"output_{i}.mp4", "wb") as f:
f.write(video.video.data)
print(f"Saved: output_{i}.mp4")
Output:
Status: generating...
Status: generating...
Video ready!
Saved: output_0.mp4
JavaScript — Full Example:
javascript
const { GoogleGenAI } = require("@google/genai");
const ai = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });
async function generateVideo() {
let operation = await ai.models.generateVideos({
model: "veo-3.0-generate-preview",
prompt:
"A golden retriever running through a sunflower field at sunset. " +
"Warm golden light. Slow motion. Shallow depth of field.",
config: {
numberOfVideos: 1,
durationSeconds: 8,
negativePrompt: "blurry, distorted, low quality",
generateAudio: true,
},
});
// Poll until done
while (!operation.done) {
await new Promise((r) => setTimeout(r, 20000));
operation = await ai.operations.get(operation);
console.log("Status: generating...");
}
// Save video
const video = operation.result.generatedVideos[0];
require("fs").writeFileSync("output.mp4", Buffer.from(video.video.data));
console.log("Saved: output.mp4");
}
generateVideo();
Step 4: Configure Audio Options
python
config=types.GenerateVideosConfig(
generate_audio=True, # Enable native audio
include_dialogue=True, # Enable spoken dialogue
include_ambient=True, # Enable ambient sounds
include_music=True, # Enable background music
)
The 5-Part Prompt Formula

This is the formula that separates amateur results from cinematic quality:
[Shot Composition] + [Subject Details] + [Action] + [Setting/Environment] + [Aesthetics/Audio]
Template
A [shot type] of [subject with details] [performing action] in [setting].
The camera [movement]. Style is [visual style] with [lighting] and [color mood].
Audio includes [ambience, SFX, or dialogue].
Prompt Length Sweet Spot
- Minimum: 2–3 sentences (~50 words)
- Optimal: 3–6 sentences (~100–150 words)
- Too long: 200+ words (Veo ignores excess)
Prompt Examples (Copy-Paste Ready)
Example 1: Product Commercial
A slow dolly-in shot of a sleek smartphone on a marble table.
Soft studio lighting with warm highlights and cool shadows.
The phone screen glows, showing a notification.
Camera moves from wide to close-up.
Audio: subtle electronic hum, soft chime notification sound.
Style: Apple-commercial aesthetic, shallow depth of field.
Example 2: Nature Documentary
An aerial drone shot of a whale breaching the ocean surface.
Golden hour lighting with scattered clouds.
Camera tracks the whale as it rises and splashes down.
Slow motion at 120fps feel.
Audio: dramatic orchestral swell, ocean waves crashing,
whale song echo.
Example 3: Dialogue Scene
A medium two-shot of two friends sitting in a coffee shop.
Natural window lighting, bokeh background.
Person 1 (woman, 30s, brown hair) says: "Did you hear about the new project?"
Person 2 (man, 30s, glasses) responds: "Yeah, it's going to change everything."
Both laugh.
Audio: coffee shop ambient noise, espresso machine in background,
warm indie guitar music faintly playing.
Example 4: Tutorial/Explainer
A top-down close-up of hands typing on a mechanical keyboard.
Clean desk setup with monitor showing code.
Fingers move rapidly across keys.
Camera slowly pulls back to reveal the full workspace.
Audio: satisfying mechanical keyboard clicks, soft lo-fi music.
Camera Shots Cheat Sheet
| Shot Type | Use For | Prompt Keyword |
|---|---|---|
| Wide/establishing | Scene context | wide shot, establishing shot |
| Medium | Conversations | medium shot, waist-up |
| Close-up | Emotion, detail | close-up, tight shot |
| Extreme close-up | Texture, eyes | extreme close-up, macro |
| Aerial | Landscapes | aerial view, drone shot |
| POV | Immersion | POV shot, first-person |
| Low angle | Power, drama | low angle, worm's eye |
Camera Movements Cheat Sheet
| Movement | Effect | Prompt Keyword |
|---|---|---|
| Dolly in/out | Draw closer/reveal | dolly-in, dolly-out |
| Pan left/right | Survey scene | slow pan left |
| Tracking | Follow subject | tracking shot |
| Crane | Dramatic reveal | crane shot rising |
| Handheld | Raw, urgent | handheld camera shake |
| Whip pan | Fast transition | whip-pan |
| Static | Calm, observational | locked-off static camera |
Negative Prompts (What to Avoid)
Always include a negative_prompt to improve quality:
python
negative_prompt="blurry, distorted faces, extra fingers, "
"low quality, watermark, text overlay, "
"choppy motion, unrealistic physics"
Veo 3 vs Sora 2 vs Kling 3.0
| Feature | Veo 3.1 | Sora 2 | Kling 3.0 |
|---|---|---|---|
| Max resolution | 4K | 1080p | 4K/60fps |
| Max duration | 8 sec | 20 sec | 10 sec |
| Native audio | Yes | No | No |
| Lip sync | Yes | No | Partial |
| API available | Yes | No (web only) | Yes |
| Free tier | Google Vids | No | Free credits |
| Cost per second | ~$0.03 | ~$0.15 | ~$0.126 |
| Best for | Cinematic + audio | Long clips | High-volume ads |
Bottom line: Veo 3 wins on audio + cost. Sora 2 wins on clip length. Kling 3.0 wins on resolution + price entry ($6.99/mo).
Advanced Tips
1. Iterate from Simple to Complex
# Start simple
"A cat sitting on a windowsill"
# Add details gradually
"A tabby cat sitting on a wooden windowsill, rain outside"
# Full cinematic prompt
"A close-up of a tabby cat sitting on a wooden windowsill.
Rain drops streak down the glass behind it.
Soft gray natural light. Shallow depth of field.
The cat turns its head slowly toward camera.
Audio: rain pattering on glass, distant thunder rumble,
cat purring softly."
2. Use Reference Images (Veo 3.1)
Veo 3.1 supports "ingredients-to-video" — upload a reference image to maintain character appearance across scenes.
3. Extend Videos
Chain multiple 8-second clips for longer content:
python
# Generate initial clip
operation = client.models.generate_videos(
model="veo-3.0-generate-preview",
prompt="Scene 1: ...",
config=types.GenerateVideosConfig(duration_seconds=8),
)
# Extend with next scene (Veo 3.1)
extend_operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt="Continue the scene: ...",
config=types.GenerateVideosConfig(
duration_seconds=8,
extend_video=previous_video, # Pass previous output
),
)
4. Batch Generation for A/B Testing
python
# Generate 4 variations of the same concept
operation = client.models.generate_videos(
model="veo-3.0-generate-preview",
prompt="Your prompt here",
config=types.GenerateVideosConfig(
number_of_videos=4, # Up to 4 variants
),
)
Common Mistakes to Avoid
| Mistake | Fix |
|---|---|
| Vague prompts ("a cool video") | Be specific: subject + action + setting |
| No camera direction | Always specify shot type + movement |
| Ignoring audio | Add audio cues — it's Veo 3's superpower |
| Prompts over 200 words | Keep to 100–150 words max |
| No negative prompt | Always exclude unwanted elements |
| Expecting perfect first try | Iterate: simple → detailed |
Real-World Use Cases
- YouTube Thumbnails/Intros — Generate 8-sec cinematic intros
- Product Demos — Showcase products with studio lighting
- Social Media Reels — Quick vertical video content
- Ad Creatives — A/B test multiple ad variations fast
- Explainer Videos — Visual aids for tutorials
- Podcasts — Add visual scenes to audio content
- Storyboarding — Visualize film concepts before shooting
Comments
Be the first to comment