I Built a Tool to Generate Video Ads with AI

So I run globalpetsitter.com. Connecting pet owners with sitters around the world. Like any startup I need promo content, specifically video ads for social media. The problem is making even a simple 30 second ad takes forever

The Problem

Every time I wanted to make a video ad the process was something like:

Write a script or storyboard
Find or create images for each scene
Generate or record voiceover
Convert images to video clips with motion
Sync audio with video
Edit everything together in capcut

Each step needs different tools, different logins, tons of context switching. For a 6 scene ad we're talking hours. And if I didn't like the result? Start over

Ad Forge

I built ad forge to collapse all of this into one pipeline. Describe your ad in plain text, let ai do the heavy lifting

Here's what the output looks like:

How It Works

7 stages:

1. Sketch - describe your ad concept, target audience, tone, duration. For the globalpetsitter ad I wrote something like "woman leaving for travel, worried about her pet, then feeling relieved knowing they're cared for"

2. Scenes - gemini breaks your sketch into individual scenes with descriptions, settings, mood, suggested durations. It structures the narrative arc automatically

3. Style - ai generates a style guide. Color palette, lighting, visual mood, character descriptions, location details. Keeps everything visually consistent

4. Images - fal.ai generates an image for each scene. The system uses reference images from previous scenes and character portraits to maintain consistency. This was the hardest part to get right

5. Videos - each image becomes a video clip with camera movement (pan, zoom, dolly, etc). Fal.ai's image-to-video is pretty good at this

6. Audio - for scenes with dialogue it generates voiceover with tts. You can assign different voices to different characters

7. Merge - combines video and audio, optional lip-sync for talking characters. Ffmpeg handles this

Tech Stack

next.js 16 with react 19 for the ui
google gemini for script generation and scene breakdown
fal.ai for image generation and image-to-video
openai for some text generation
ffmpeg webassembly for video processing in browser

Some Design Decisions

Campaign persistence - everything saves to localstorage automatically. You can close the browser and pick up later

Reference images - this was crucial. When generating scene 3 you can reference the location image, character portraits, previous scenes. The ai uses these as style anchors

Stage based workflow - each stage produces output you can review. Don't like the scenes? Regenerate before moving on. Gives you control without overwhelming options

Results

The globalpetsitter ad that would've taken me a full day now takes about 30 minutes of active work (plus generation time). More importantly I can iterate fast. Try different tones, swap scenes, regenerate individual images without starting over

What's Next

Ad forge is still rough. I want to add:

background music selection
more camera movement options
direct export to social formats (9:16 for tiktok/reels, 16:9 for youtube)
templates for common ad formats

For now it's solving my problem: making video ads for globalpetsitter without the time sink. Sometimes that's enough :)

Stay Updated

Get notified about new posts on automation, productivity tips, indie hacking, and web3.

No spam, ever. Unsubscribe anytime.

I Built a Tool to Generate Video Ads with AI

The Problem

Ad Forge

How It Works

Tech Stack

Some Design Decisions

Results

What's Next

Stay Updated

Comments

Related Posts

Building SmallShop Part 1: Laying the Foundation

When your AI agents start dating: connecting two OpenClaw bots for couples

Building the Machine That Replaces You