Text-to-Video API: Build Your First AI Video in Under 5 Minutes

You type a sentence. An API returns a video. That's text-to-video in 2026 — and it's no longer experimental. Production teams use it daily for social content, ads, and product demos.

This guide gets you from zero to rendered video in under 5 minutes. No video editing experience needed.

What a Text-to-Video API Actually Does

A text-to-video API accepts a text prompt (like "a drone shot over a tropical island at golden hour") and returns a rendered video clip — typically 3-10 seconds at 720p or 1080p.

Under the hood, a generative AI model (like Kling 3.0, Sora 2, or Veo 3.1) interprets your prompt and generates frames that are stitched into a video. The API handles all the compute — you don't need GPUs, model weights, or machine learning expertise.

Through the SamAutomation AI API, you access 29+ video generation models from a single endpoint. Same authentication, same response format, different models.

Step 1: Get Your API Key (1 minute)

Sign up at samautomation.work. The free tier includes enough credits to generate your first videos — no credit card required.

Navigate to your dashboard, copy your API key from the settings page. You'll need this for every API call.

Step 2: Choose Your Model (30 seconds)

For your first video, use Kling 3.0. It offers the best balance of quality, speed, and cost:

Generation time: 15-30 seconds
Quality: 8.5/10
Cost: ~50 credits per 5-second clip

For a full comparison of all available models, check our AI video model comparison.

Step 3: Write Your Prompt (1 minute)

The prompt determines everything. Here's the structure that produces consistent results:

[Subject] + [Action] + [Setting] + [Style] + [Camera]

Example prompts that work:

"A coffee cup steaming on a wooden desk, morning sunlight through a window, cinematic, shallow depth of field"
"A woman jogging through a city park at sunrise, slow motion, golden hour lighting, shot on 35mm film"
"Product packaging rotating 360 degrees on a white background, studio lighting, commercial quality"
"Abstract liquid metal flowing and morphing, iridescent colors, macro lens, 4K quality"

Prompts that don't work well: - "Make a cool video" (too vague) - "A video of everything happening in a busy city street with 50 people and cars and bikes and..." (too complex) - "Text saying 'SALE 50% OFF' appearing on screen" (AI models struggle with text rendering)

Pro tip: Keep prompts under 100 words. Be specific about the subject and style, but don't over-describe every detail. Let the model fill in the gaps — it's often more creative than prescriptive prompts.

Step 4: Make the API Call (1 minute)

Here's a simple Python example:

import requests

response = requests.post(
    "https://samautomation.work/api/ai/video/generate/",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "kling-3.0",
        "prompt": "A coffee cup steaming on a wooden desk, morning sunlight through a window, cinematic",
        "duration": 5,
        "aspect_ratio": "16:9"
    }
)

result = response.json()
print(f"Video URL: {result['video_url']}")

Or use cURL if you prefer the command line:

curl -X POST https://samautomation.work/api/ai/video/generate/ \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kling-3.0",
    "prompt": "A coffee cup steaming on a wooden desk, cinematic",
    "duration": 5,
    "aspect_ratio": "16:9"
  }'

The API returns a job ID immediately. Poll for the result, or set up a webhook to get notified when rendering completes.

Step 5: Get Your Video (1-2 minutes wait)

Depending on the model and server load, your video renders in 15-90 seconds. The response includes:

video_url — Direct download link for the rendered MP4
thumbnail_url — Preview frame from the video
credits_used — How many credits this generation consumed
generation_time — How long it took

Download the video, review it, and iterate on your prompt if needed.

Making It Production-Ready

Your first AI-generated clip is a building block. To turn it into publishable content, add these layers:

Add Captions

Pass the video through AutoCaptions to add burned-in subtitles. Essential for social media where 85% of video is watched without sound.

Compose with JSON Templates

Combine your AI clip with text overlays, logos, music, and transitions using the JSON Video API. This gives you a polished final product instead of a raw AI clip.

Build a Workflow

Connect the entire pipeline in n8n: 1. Schedule trigger → runs daily 2. AI generates a script from your content calendar 3. Text-to-video API creates the visual 4. JSON Video API adds branding and captions 5. Automated posting to your social channels

Optimizing Your Results

Image-to-Video: More Control

If you need the video to look like a specific image (your product, your logo, a specific style), use image-to-video instead of text-to-video:

{
  "model": "kling-3.0",
  "image_url": "https://your-site.com/product-photo.jpg",
  "prompt": "gentle zoom in with subtle camera movement",
  "duration": 5
}

This animates your existing image, giving you much more predictable output than generating from text alone.

Negative Prompts

Some models support negative prompts — telling the AI what to avoid:

{
  "prompt": "a modern kitchen, bright and clean",
  "negative_prompt": "people, text, watermarks, blurry"
}

Seed Values

For reproducible results, pass a seed value. Same prompt + same seed = same (or very similar) output. Useful for iterating: keep the seed, adjust the prompt.

Cost Breakdown

Model	5-second clip	10-second clip	Credits/month (Basic)
Kling 3.0	~50 credits	~90 credits	1,450 included
Veo 3.1	~80 credits	~150 credits	1,450 included
Sora 2	~120 credits	~220 credits	1,450 included
Pixverse V4.5	~30 credits	~55 credits	1,450 included

On the Basic plan (€29.95/month), you get 1,450 AI credits — enough for roughly 29 Kling clips or 18 Veo clips per month. The Pro plan (€49.95/month) gives you 2,450 credits.

For high-volume needs, the BYOK (Bring Your Own Key) option lets you connect your own model API keys and pay the model providers directly, which is often cheaper at scale.

What's Next

Now that you've generated your first AI video, explore these next steps:

Try different models — Each model has a distinct style. Compare them all.
Build a pipeline — Automate daily content creation with n8n workflows.
Add personalization — Use JSON templates to create data-driven video at scale.
Repurpose content — Turn your blog posts into 50+ videos automatically.

The API documentation has everything you need: samautomation.work/api/ai/docs. Start experimenting — the best way to learn what these models can do is to use them.

Text-to-Video API: Build Your First AI Video in Under 5 Minutes

Text-to-Video API: Build Your First AI Video in Under 5 Minutes

What a Text-to-Video API Actually Does

Step 1: Get Your API Key (1 minute)

Step 2: Choose Your Model (30 seconds)

Step 3: Write Your Prompt (1 minute)

Step 4: Make the API Call (1 minute)

Step 5: Get Your Video (1-2 minutes wait)

Making It Production-Ready

Add Captions

Compose with JSON Templates

Build a Workflow

Optimizing Your Results

Image-to-Video: More Control

Negative Prompts

Seed Values

Cost Breakdown

What's Next

Related Articles

JSON to Video: The Complete Developer Reference Guide

How to Build a Telegram Video Bot with n8n and a Video API

n8n Video Automation: The Complete Guide to No-Code Video Workflows