Google Veo 3 Tutorial: AI Video Generation Guide (2026)

Google Veo 3 Tutorial: AI Video Generation Guide (2026)

K
Kodetra Technologies·April 11, 2026·5 min read Intermediate

Summary

Learn Google Veo 3 from scratch — generate AI videos with text prompts, API code, and pro prompt tips in 2026.

What Is Google Veo 3?

Veo 3 is Google DeepMind's AI video generation model. It turns text descriptions into realistic video clips with native audio generation.

Key specs:

FeatureDetail
Resolution720p, 1080p, 4K
DurationUp to 8 seconds
AudioNative — dialogue, SFX, ambient, music
Lip syncYes, built-in
Latest versionVeo 3.1 (March 2026)
Fast variantVeo 3.1 Lite (low-cost, rapid iteration)

How to Access Veo 3

Option 1: Google Vids (FREE)

Best for: Quick experiments, no coding required.

  1. Go to Google Vids
  2. Sign in with any Google account
  3. Click "Generate video"
  4. Type your prompt
  5. Wait 30–90 seconds
  6. Download your video

No paid subscription needed. Uses Veo 3.1 under the hood.

Option 2: Gemini Advanced (Google AI Ultra)

Best for: Higher quality, longer conversations, integrated workflow.

  1. Subscribe to Google AI Ultra ($249.99/month)
  2. Open gemini.google.com
  3. Type a video prompt in the chat
  4. Veo 3 generates the video inline
  5. Download or share directly

Option 3: Gemini API (Developers)

Best for: Automation, apps, batch generation.

  1. Get API key at Google AI Studio
  2. Install the SDK
  3. Call the video generation endpoint
  4. Poll for completion
  5. Download the result

Quick Start: Your First Video (API)

Step 1: Install the SDK

Python:

bash

pip install google-genai

JavaScript:

bash

npm install @google/genai

Step 2: Set Your API Key

Python:

python

import os
os.environ["GEMINI_API_KEY"] = "your-api-key-here"

JavaScript:

javascript

const { GoogleGenAI } = require("@google/genai");
const ai = new GoogleGenAI({ apiKey: "your-api-key-here" });

Step 3: Generate a Video

Python — Full Example:

python

from google import genai
from google.genai import types
import time

client = genai.Client(api_key="YOUR_API_KEY")

# Generate video
operation = client.models.generate_videos(
    model="veo-3.0-generate-preview",
    prompt="A golden retriever running through a sunflower field at sunset. "
           "Warm golden light. Slow motion. Shallow depth of field. "
           "Sound of birds chirping and gentle wind.",
    config=types.GenerateVideosConfig(
        number_of_videos=1,
        duration_seconds=8,
        negative_prompt="blurry, distorted, low quality",
        generate_audio=True,
    ),
)

# Poll until done
while not operation.done:
    time.sleep(20)
    operation = client.operations.get(operation)
    print("Status: generating...")

print("Video ready!")

# Download
for i, video in enumerate(operation.result.generated_videos):
    with open(f"output_{i}.mp4", "wb") as f:
        f.write(video.video.data)
    print(f"Saved: output_{i}.mp4")

Output:

Status: generating...
Status: generating...
Video ready!
Saved: output_0.mp4

JavaScript — Full Example:

javascript

const { GoogleGenAI } = require("@google/genai");

const ai = new GoogleGenAI({ apiKey: "YOUR_API_KEY" });

async function generateVideo() {
  let operation = await ai.models.generateVideos({
    model: "veo-3.0-generate-preview",
    prompt:
      "A golden retriever running through a sunflower field at sunset. " +
      "Warm golden light. Slow motion. Shallow depth of field.",
    config: {
      numberOfVideos: 1,
      durationSeconds: 8,
      negativePrompt: "blurry, distorted, low quality",
      generateAudio: true,
    },
  });

  // Poll until done
  while (!operation.done) {
    await new Promise((r) => setTimeout(r, 20000));
    operation = await ai.operations.get(operation);
    console.log("Status: generating...");
  }

  // Save video
  const video = operation.result.generatedVideos[0];
  require("fs").writeFileSync("output.mp4", Buffer.from(video.video.data));
  console.log("Saved: output.mp4");
}

generateVideo();

Step 4: Configure Audio Options

python

config=types.GenerateVideosConfig(
    generate_audio=True,       # Enable native audio
    include_dialogue=True,     # Enable spoken dialogue
    include_ambient=True,      # Enable ambient sounds
    include_music=True,        # Enable background music
)

The 5-Part Prompt Formula

This is the formula that separates amateur results from cinematic quality:

[Shot Composition] + [Subject Details] + [Action] + [Setting/Environment] + [Aesthetics/Audio]

Template

A [shot type] of [subject with details] [performing action] in [setting].
The camera [movement]. Style is [visual style] with [lighting] and [color mood].
Audio includes [ambience, SFX, or dialogue].

Prompt Length Sweet Spot

  • Minimum: 2–3 sentences (~50 words)
  • Optimal: 3–6 sentences (~100–150 words)
  • Too long: 200+ words (Veo ignores excess)

Prompt Examples (Copy-Paste Ready)

Example 1: Product Commercial

A slow dolly-in shot of a sleek smartphone on a marble table.
Soft studio lighting with warm highlights and cool shadows.
The phone screen glows, showing a notification.
Camera moves from wide to close-up.
Audio: subtle electronic hum, soft chime notification sound.
Style: Apple-commercial aesthetic, shallow depth of field.

Example 2: Nature Documentary

An aerial drone shot of a whale breaching the ocean surface.
Golden hour lighting with scattered clouds.
Camera tracks the whale as it rises and splashes down.
Slow motion at 120fps feel.
Audio: dramatic orchestral swell, ocean waves crashing,
whale song echo.

Example 3: Dialogue Scene

A medium two-shot of two friends sitting in a coffee shop.
Natural window lighting, bokeh background.
Person 1 (woman, 30s, brown hair) says: "Did you hear about the new project?"
Person 2 (man, 30s, glasses) responds: "Yeah, it's going to change everything."
Both laugh.
Audio: coffee shop ambient noise, espresso machine in background,
warm indie guitar music faintly playing.

Example 4: Tutorial/Explainer

A top-down close-up of hands typing on a mechanical keyboard.
Clean desk setup with monitor showing code.
Fingers move rapidly across keys.
Camera slowly pulls back to reveal the full workspace.
Audio: satisfying mechanical keyboard clicks, soft lo-fi music.

Camera Shots Cheat Sheet

Shot TypeUse ForPrompt Keyword
Wide/establishingScene contextwide shot, establishing shot
MediumConversationsmedium shot, waist-up
Close-upEmotion, detailclose-up, tight shot
Extreme close-upTexture, eyesextreme close-up, macro
AerialLandscapesaerial view, drone shot
POVImmersionPOV shot, first-person
Low anglePower, dramalow angle, worm's eye

Camera Movements Cheat Sheet

MovementEffectPrompt Keyword
Dolly in/outDraw closer/revealdolly-in, dolly-out
Pan left/rightSurvey sceneslow pan left
TrackingFollow subjecttracking shot
CraneDramatic revealcrane shot rising
HandheldRaw, urgenthandheld camera shake
Whip panFast transitionwhip-pan
StaticCalm, observationallocked-off static camera

Negative Prompts (What to Avoid)

Always include a negative_prompt to improve quality:

python

negative_prompt="blurry, distorted faces, extra fingers, "
                "low quality, watermark, text overlay, "
                "choppy motion, unrealistic physics"

Veo 3 vs Sora 2 vs Kling 3.0

FeatureVeo 3.1Sora 2Kling 3.0
Max resolution4K1080p4K/60fps
Max duration8 sec20 sec10 sec
Native audioYesNoNo
Lip syncYesNoPartial
API availableYesNo (web only)Yes
Free tierGoogle VidsNoFree credits
Cost per second~$0.03~$0.15~$0.126
Best forCinematic + audioLong clipsHigh-volume ads

Bottom line: Veo 3 wins on audio + cost. Sora 2 wins on clip length. Kling 3.0 wins on resolution + price entry ($6.99/mo).


Advanced Tips

1. Iterate from Simple to Complex

# Start simple
"A cat sitting on a windowsill"

# Add details gradually
"A tabby cat sitting on a wooden windowsill, rain outside"

# Full cinematic prompt
"A close-up of a tabby cat sitting on a wooden windowsill.
Rain drops streak down the glass behind it.
Soft gray natural light. Shallow depth of field.
The cat turns its head slowly toward camera.
Audio: rain pattering on glass, distant thunder rumble,
cat purring softly."

2. Use Reference Images (Veo 3.1)

Veo 3.1 supports "ingredients-to-video" — upload a reference image to maintain character appearance across scenes.

3. Extend Videos

Chain multiple 8-second clips for longer content:

python

# Generate initial clip
operation = client.models.generate_videos(
    model="veo-3.0-generate-preview",
    prompt="Scene 1: ...",
    config=types.GenerateVideosConfig(duration_seconds=8),
)

# Extend with next scene (Veo 3.1)
extend_operation = client.models.generate_videos(
    model="veo-3.1-generate-preview",
    prompt="Continue the scene: ...",
    config=types.GenerateVideosConfig(
        duration_seconds=8,
        extend_video=previous_video,  # Pass previous output
    ),
)

4. Batch Generation for A/B Testing

python

# Generate 4 variations of the same concept
operation = client.models.generate_videos(
    model="veo-3.0-generate-preview",
    prompt="Your prompt here",
    config=types.GenerateVideosConfig(
        number_of_videos=4,  # Up to 4 variants
    ),
)

Common Mistakes to Avoid

MistakeFix
Vague prompts ("a cool video")Be specific: subject + action + setting
No camera directionAlways specify shot type + movement
Ignoring audioAdd audio cues — it's Veo 3's superpower
Prompts over 200 wordsKeep to 100–150 words max
No negative promptAlways exclude unwanted elements
Expecting perfect first tryIterate: simple → detailed

Real-World Use Cases

  1. YouTube Thumbnails/Intros — Generate 8-sec cinematic intros
  2. Product Demos — Showcase products with studio lighting
  3. Social Media Reels — Quick vertical video content
  4. Ad Creatives — A/B test multiple ad variations fast
  5. Explainer Videos — Visual aids for tutorials
  6. Podcasts — Add visual scenes to audio content
  7. Storyboarding — Visualize film concepts before shooting

Comments

Subscribe to join the conversation...

Be the first to comment