API Integrations for AI Video
Why APIs Are the Foundation of Automation
Every AI tool you use through a web interface has an API (Application Programming Interface) behind it. APIs let your code talk directly to AI services — sending prompts, receiving outputs, and chaining services together without opening a single browser tab.
In an automated video pipeline, APIs replace manual interactions: instead of typing a prompt into ChatGPT's web interface, your code sends a POST request to the OpenAI API. Instead of dragging an image into Runway's upload box, your code sends the image URL to Runway's API endpoint.
Authentication and API Keys
Every API requires authentication — proof that you have permission to use it. The most common method is an API key: a long string of characters you include in each request's headers.
| Auth Method | How It Works | Used By |
|---|---|---|
| API Key (Bearer Token) | Send key in Authorization header | OpenAI, Runway, Stability AI, ElevenLabs |
| OAuth 2.0 | Token exchange flow, requires user consent | YouTube, Instagram, TikTok, Google services |
| API Key (Query Param) | Append key to URL as parameter | Some legacy APIs, Google Maps |
| JWT (JSON Web Token) | Self-signed token with claims | Firebase, custom backends |
// NEVER hardcode API keys in your source code.
// Use environment variables loaded from a .env file.
// .env file (add to .gitignore!):
// OPENAI_API_KEY=sk-proj-abc123...
// RUNWAY_API_KEY=rw_key_xyz789...
// ELEVENLABS_API_KEY=el_key_def456...
// STABILITY_API_KEY=sk-stab-ghi012...
// Load in Node.js:
import 'dotenv/config';
const openaiKey = process.env.OPENAI_API_KEY;
const runwayKey = process.env.RUNWAY_API_KEY;
// Use in requests:
const headers = {
'Authorization': `Bearer ${openaiKey}`,
'Content-Type': 'application/json'
};OpenAI API (GPT + DALL-E)
The OpenAI API is typically the first service in any video pipeline. GPT handles script generation, metadata creation, and content planning, while DALL-E handles image generation.
| Endpoint | Purpose | Model | Cost (approx) |
|---|---|---|---|
| /v1/chat/completions | Script generation, metadata, planning | gpt-4, gpt-4o, gpt-3.5-turbo | $0.01-$0.03 per 1K tokens |
| /v1/images/generations | Scene image creation | dall-e-3, dall-e-2 | $0.04-$0.08 per image (HD) |
| /v1/audio/speech | Text-to-speech narration | tts-1, tts-1-hd | $0.015 per 1K chars |
| /v1/audio/transcriptions | Transcribe audio for captions | whisper-1 | $0.006 per minute |
| /v1/embeddings | Content similarity matching | text-embedding-3-small | $0.00002 per 1K tokens |
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const generateScript = async (topic, duration = 60) => {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a professional video scriptwriter. Write a ${duration}-second video script. Return a JSON object with: title, total_duration, and scenes array. Each scene has: scene_number, narration, visual_prompt (for AI image generation), duration (seconds), transition (fade/cut/dissolve).`
},
{
role: 'user',
content: `Write a script about: ${topic}`
}
],
response_format: { type: 'json_object' },
temperature: 0.8,
max_tokens: 2000
});
return JSON.parse(response.choices[0].message.content);
};
const script = await generateScript('How Runway Gen-3 is changing video production');const generateSceneImage = async (visualPrompt) => {
const response = await openai.images.generate({
model: 'dall-e-3',
prompt: `${visualPrompt}. Cinematic, 4K, photorealistic, 16:9 aspect ratio, no text or watermarks.`,
size: '1792x1024', // Closest to 16:9
quality: 'hd', // Higher detail ($0.080 vs $0.040)
style: 'natural', // 'vivid' for more dramatic, 'natural' for realistic
n: 1
});
return {
url: response.data[0].url,
revised_prompt: response.data[0].revised_prompt
};
};Runway API (Image-to-Video & Text-to-Video)
Runway's API provides image-to-video and text-to-video generation — the core of turning static AI images into dynamic video clips. Gen-3 Alpha Turbo is the recommended model for automated pipelines due to its balance of quality and speed.
const generateRunwayVideo = async (imageUrl, motionPrompt) => {
// Step 1: Create generation task
const createResponse = await fetch('https://api.runwayml.com/v1/image_to_video', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}`,
'X-Runway-Version': '2024-11-06',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gen3a_turbo',
promptImage: imageUrl,
promptText: motionPrompt,
duration: 5, // 5 or 10 seconds
ratio: '16:9',
watermark: false
})
});
const { id: taskId } = await createResponse.json();
// Step 2: Poll for completion
let task;
do {
await sleep(10000); // Check every 10s
const statusResponse = await fetch(
`https://api.runwayml.com/v1/tasks/${taskId}`,
{ headers: { 'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}` } }
);
task = await statusResponse.json();
} while (task.status === 'RUNNING');
if (task.status === 'SUCCEEDED') {
return task.output[0]; // Video URL
}
throw new Error(`Runway generation failed: ${task.failure}`);
};ElevenLabs API (Voice & Sound Effects)
ElevenLabs provides the highest-quality text-to-speech available via API, along with voice cloning and AI sound effects. In video pipelines, it generates narration audio from scripts.
| Endpoint | Purpose | Key Parameters |
|---|---|---|
| /v1/text-to-speech/{voice_id} | Generate speech from text | voice_id, model_id, text, voice_settings |
| /v1/text-to-speech/{voice_id}/stream | Stream audio in real-time | Same + output_format |
| /v1/sound-generation | AI sound effects | text (description of sound), duration_seconds |
| /v1/voices | List available voices | None (GET request) |
| /v1/voices/add | Clone a voice from audio samples | name, files (audio samples) |
const generateVoiceover = async (text, voiceId = 'pNInz6obpgDQGcFmaJgB') => {
// voiceId 'pNInz6obpgDQGcFmaJgB' = "Adam" (deep, narrator voice)
const response = await fetch(
`https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,
{
method: 'POST',
headers: {
'xi-api-key': process.env.ELEVENLABS_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: text,
model_id: 'eleven_multilingual_v2', // Best quality
voice_settings: {
stability: 0.5, // 0-1: lower = more expressive
similarity_boost: 0.75, // 0-1: higher = closer to original voice
style: 0.3, // 0-1: style exaggeration
use_speaker_boost: true
},
output_format: 'mp3_44100_128' // High quality MP3
})
}
);
// Response is raw audio bytes
const audioBuffer = await response.arrayBuffer();
fs.writeFileSync('voiceover.mp3', Buffer.from(audioBuffer));
return 'voiceover.mp3';
};const generateSoundEffect = async (description, duration = 5) => {
const response = await fetch(
'https://api.elevenlabs.io/v1/sound-generation',
{
method: 'POST',
headers: {
'xi-api-key': process.env.ELEVENLABS_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
text: description, // e.g., "dramatic cinematic whoosh transition"
duration_seconds: duration,
prompt_influence: 0.5 // 0-1: how closely to follow the description
})
}
);
const audioBuffer = await response.arrayBuffer();
fs.writeFileSync('sfx.mp3', Buffer.from(audioBuffer));
return 'sfx.mp3';
};
// Generate transition sound effects
await generateSoundEffect('cinematic whoosh transition sound', 2);
await generateSoundEffect('gentle ambient background music, technology theme', 60);Stability AI API
Stability AI offers image generation through Stable Diffusion models, image upscaling, inpainting, and outpainting. Their API is significantly cheaper than DALL-E for bulk generation.
const generateStabilityImage = async (prompt) => {
const response = await fetch(
'https://api.stability.ai/v2beta/stable-image/generate/sd3',
{
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.STABILITY_API_KEY}`,
'Accept': 'image/*'
},
body: (() => {
const formData = new FormData();
formData.append('prompt', prompt);
formData.append('output_format', 'png');
formData.append('aspect_ratio', '16:9');
formData.append('model', 'sd3.5-large'); // Best quality
formData.append('negative_prompt', 'text, watermark, blurry, low quality');
return formData;
})()
}
);
const imageBuffer = await response.arrayBuffer();
return Buffer.from(imageBuffer);
};Google Cloud Video AI
Google Cloud Video Intelligence API does not generate videos — instead, it analyzes existing videos. In automation pipelines, it is used for quality control: detecting scene changes, transcribing speech, identifying objects, and flagging inappropriate content.
| Feature | Use in Pipeline | API Method |
|---|---|---|
| Label Detection | Verify video content matches intent | annotate (LABEL_DETECTION) |
| Shot Detection | Validate transitions and scene changes | annotate (SHOT_CHANGE_DETECTION) |
| Speech Transcription | Generate captions automatically | annotate (SPEECH_TRANSCRIPTION) |
| Explicit Content Detection | Quality gate before publishing | annotate (EXPLICIT_CONTENT_DETECTION) |
| Object Tracking | Verify visual consistency across scenes | annotate (OBJECT_TRACKING) |
API Comparison Table
Choosing the right API for each pipeline stage depends on quality requirements, speed, cost, and rate limits. Here is a comprehensive comparison.
| Category | API | Quality | Speed | Cost (Low End) | Rate Limit |
|---|---|---|---|---|---|
| Text/Script | OpenAI GPT-4o | Excellent | Fast (2-5s) | $0.005/1K tokens | 500 RPM |
| Text/Script | Anthropic Claude | Excellent | Fast (2-5s) | $0.003/1K tokens | 50 RPM (varies) |
| Images | OpenAI DALL-E 3 | Very Good | 10-15s | $0.04/image | 7-15 RPM |
| Images | Stability AI SD3.5 | Good | 5-10s | $0.002/image | 150 RPM |
| Images | Midjourney (unofficial) | Excellent | 30-60s | $0.01/image | Varies |
| Video | Runway Gen-3 | Excellent | 60-120s | $0.05/clip | 10 concurrent |
| Video | Kling AI | Very Good | 90-180s | $0.03/clip | 3 concurrent |
| Voice | ElevenLabs | Excellent | 2-5s | $0.18/1K chars | 100 RPM |
| Voice | OpenAI TTS | Good | 1-3s | $0.015/1K chars | 50 RPM |
| Analysis | Google Video AI | Excellent | 30-120s | $0.10/minute | 600 RPM |
Rate Limiting and Error Handling
Robust API integration requires handling failures gracefully. The three most common failures are: rate limit errors (429), server errors (500/503), and timeout errors.
const callAPIWithRetry = async (apiCall, options = {}) => {
const { maxRetries = 3, baseDelay = 1000, maxDelay = 30000 } = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const result = await apiCall();
return result;
} catch (error) {
const isRateLimit = error.status === 429;
const isServerError = error.status >= 500;
const isRetryable = isRateLimit || isServerError;
if (!isRetryable || attempt === maxRetries) {
throw error;
}
// Use Retry-After header if available, otherwise exponential backoff
let delay;
if (isRateLimit && error.headers?.['retry-after']) {
delay = parseInt(error.headers['retry-after']) * 1000;
} else {
delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
}
console.warn(
`API call failed (${error.status}). ` +
`Retry ${attempt + 1}/${maxRetries} in ${delay}ms...`
);
await sleep(delay);
}
}
};
// Usage:
const script = await callAPIWithRetry(
() => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] }),
{ maxRetries: 3, baseDelay: 2000 }
);Webhook Integrations
Webhooks allow APIs to push notifications to your server when an event occurs, instead of you polling for status updates. This is more efficient and faster for long-running tasks like video generation.
import express from 'express';
const app = express();
// Webhook endpoint that Runway calls when video generation completes
app.post('/webhooks/runway', express.json(), async (req, res) => {
const { task_id, status, output } = req.body;
if (status === 'SUCCEEDED') {
console.log(`Video ready: ${output[0]}`);
// Continue pipeline: download video, assemble, publish
await downloadVideo(output[0], `clips/${task_id}.mp4`);
await triggerAssemblyStage(task_id);
} else if (status === 'FAILED') {
console.error(`Task ${task_id} failed: ${req.body.failure}`);
await retryOrNotify(task_id);
}
res.status(200).json({ received: true });
});
app.listen(3000, () => console.log('Webhook server listening on port 3000'));
// When creating a Runway task, specify the webhook URL:
// { ...params, webhook: 'https://your-server.com/webhooks/runway' }