← Course Outline

API Integrations for AI Video

Network diagram showing multiple AI APIs connected through a central orchestration layer
API integrations connect specialized AI services into a unified video production system

Why APIs Are the Foundation of Automation

Every AI tool you use through a web interface has an API (Application Programming Interface) behind it. APIs let your code talk directly to AI services — sending prompts, receiving outputs, and chaining services together without opening a single browser tab.

In an automated video pipeline, APIs replace manual interactions: instead of typing a prompt into ChatGPT's web interface, your code sends a POST request to the OpenAI API. Instead of dragging an image into Runway's upload box, your code sends the image URL to Runway's API endpoint.

📝 Note: API access often requires a separate subscription from the web interface. For example, having a ChatGPT Plus subscription does not give you OpenAI API access — you need to set up an API account at platform.openai.com with separate billing.

Authentication and API Keys

Every API requires authentication — proof that you have permission to use it. The most common method is an API key: a long string of characters you include in each request's headers.

Auth MethodHow It WorksUsed By
API Key (Bearer Token)Send key in Authorization headerOpenAI, Runway, Stability AI, ElevenLabs
OAuth 2.0Token exchange flow, requires user consentYouTube, Instagram, TikTok, Google services
API Key (Query Param)Append key to URL as parameterSome legacy APIs, Google Maps
JWT (JSON Web Token)Self-signed token with claimsFirebase, custom backends
Secure API Key Management
// NEVER hardcode API keys in your source code.
// Use environment variables loaded from a .env file.

// .env file (add to .gitignore!):
// OPENAI_API_KEY=sk-proj-abc123...
// RUNWAY_API_KEY=rw_key_xyz789...
// ELEVENLABS_API_KEY=el_key_def456...
// STABILITY_API_KEY=sk-stab-ghi012...

// Load in Node.js:
import 'dotenv/config';

const openaiKey = process.env.OPENAI_API_KEY;
const runwayKey = process.env.RUNWAY_API_KEY;

// Use in requests:
const headers = {
  'Authorization': `Bearer ${openaiKey}`,
  'Content-Type': 'application/json'
};

OpenAI API (GPT + DALL-E)

The OpenAI API is typically the first service in any video pipeline. GPT handles script generation, metadata creation, and content planning, while DALL-E handles image generation.

EndpointPurposeModelCost (approx)
/v1/chat/completionsScript generation, metadata, planninggpt-4, gpt-4o, gpt-3.5-turbo$0.01-$0.03 per 1K tokens
/v1/images/generationsScene image creationdall-e-3, dall-e-2$0.04-$0.08 per image (HD)
/v1/audio/speechText-to-speech narrationtts-1, tts-1-hd$0.015 per 1K chars
/v1/audio/transcriptionsTranscribe audio for captionswhisper-1$0.006 per minute
/v1/embeddingsContent similarity matchingtext-embedding-3-small$0.00002 per 1K tokens
OpenAI GPT: Script Generation
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const generateScript = async (topic, duration = 60) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a professional video scriptwriter. Write a ${duration}-second video script. Return a JSON object with: title, total_duration, and scenes array. Each scene has: scene_number, narration, visual_prompt (for AI image generation), duration (seconds), transition (fade/cut/dissolve).`
      },
      {
        role: 'user',
        content: `Write a script about: ${topic}`
      }
    ],
    response_format: { type: 'json_object' },
    temperature: 0.8,
    max_tokens: 2000
  });
  return JSON.parse(response.choices[0].message.content);
};

const script = await generateScript('How Runway Gen-3 is changing video production');
OpenAI DALL-E: Image Generation
const generateSceneImage = async (visualPrompt) => {
  const response = await openai.images.generate({
    model: 'dall-e-3',
    prompt: `${visualPrompt}. Cinematic, 4K, photorealistic, 16:9 aspect ratio, no text or watermarks.`,
    size: '1792x1024',    // Closest to 16:9
    quality: 'hd',         // Higher detail ($0.080 vs $0.040)
    style: 'natural',      // 'vivid' for more dramatic, 'natural' for realistic
    n: 1
  });
  return {
    url: response.data[0].url,
    revised_prompt: response.data[0].revised_prompt
  };
};

Runway API (Image-to-Video & Text-to-Video)

Runway's API provides image-to-video and text-to-video generation — the core of turning static AI images into dynamic video clips. Gen-3 Alpha Turbo is the recommended model for automated pipelines due to its balance of quality and speed.

Runway Gen-3: Image-to-Video
const generateRunwayVideo = async (imageUrl, motionPrompt) => {
  // Step 1: Create generation task
  const createResponse = await fetch('https://api.runwayml.com/v1/image_to_video', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}`,
      'X-Runway-Version': '2024-11-06',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gen3a_turbo',
      promptImage: imageUrl,
      promptText: motionPrompt,
      duration: 5,          // 5 or 10 seconds
      ratio: '16:9',
      watermark: false
    })
  });
  const { id: taskId } = await createResponse.json();

  // Step 2: Poll for completion
  let task;
  do {
    await sleep(10000); // Check every 10s
    const statusResponse = await fetch(
      `https://api.runwayml.com/v1/tasks/${taskId}`,
      { headers: { 'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}` } }
    );
    task = await statusResponse.json();
  } while (task.status === 'RUNNING');

  if (task.status === 'SUCCEEDED') {
    return task.output[0]; // Video URL
  }
  throw new Error(`Runway generation failed: ${task.failure}`);
};

ElevenLabs API (Voice & Sound Effects)

ElevenLabs provides the highest-quality text-to-speech available via API, along with voice cloning and AI sound effects. In video pipelines, it generates narration audio from scripts.

EndpointPurposeKey Parameters
/v1/text-to-speech/{voice_id}Generate speech from textvoice_id, model_id, text, voice_settings
/v1/text-to-speech/{voice_id}/streamStream audio in real-timeSame + output_format
/v1/sound-generationAI sound effectstext (description of sound), duration_seconds
/v1/voicesList available voicesNone (GET request)
/v1/voices/addClone a voice from audio samplesname, files (audio samples)
ElevenLabs: Text-to-Speech for Narration
const generateVoiceover = async (text, voiceId = 'pNInz6obpgDQGcFmaJgB') => {
  // voiceId 'pNInz6obpgDQGcFmaJgB' = "Adam" (deep, narrator voice)
  const response = await fetch(
    `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,
    {
      method: 'POST',
      headers: {
        'xi-api-key': process.env.ELEVENLABS_API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: text,
        model_id: 'eleven_multilingual_v2', // Best quality
        voice_settings: {
          stability: 0.5,        // 0-1: lower = more expressive
          similarity_boost: 0.75, // 0-1: higher = closer to original voice
          style: 0.3,            // 0-1: style exaggeration
          use_speaker_boost: true
        },
        output_format: 'mp3_44100_128' // High quality MP3
      })
    }
  );

  // Response is raw audio bytes
  const audioBuffer = await response.arrayBuffer();
  fs.writeFileSync('voiceover.mp3', Buffer.from(audioBuffer));
  return 'voiceover.mp3';
};
ElevenLabs: AI Sound Effects
const generateSoundEffect = async (description, duration = 5) => {
  const response = await fetch(
    'https://api.elevenlabs.io/v1/sound-generation',
    {
      method: 'POST',
      headers: {
        'xi-api-key': process.env.ELEVENLABS_API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: description,         // e.g., "dramatic cinematic whoosh transition"
        duration_seconds: duration,
        prompt_influence: 0.5      // 0-1: how closely to follow the description
      })
    }
  );
  const audioBuffer = await response.arrayBuffer();
  fs.writeFileSync('sfx.mp3', Buffer.from(audioBuffer));
  return 'sfx.mp3';
};

// Generate transition sound effects
await generateSoundEffect('cinematic whoosh transition sound', 2);
await generateSoundEffect('gentle ambient background music, technology theme', 60);

Stability AI API

Stability AI offers image generation through Stable Diffusion models, image upscaling, inpainting, and outpainting. Their API is significantly cheaper than DALL-E for bulk generation.

Stability AI: Image Generation
const generateStabilityImage = async (prompt) => {
  const response = await fetch(
    'https://api.stability.ai/v2beta/stable-image/generate/sd3',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.STABILITY_API_KEY}`,
        'Accept': 'image/*'
      },
      body: (() => {
        const formData = new FormData();
        formData.append('prompt', prompt);
        formData.append('output_format', 'png');
        formData.append('aspect_ratio', '16:9');
        formData.append('model', 'sd3.5-large');  // Best quality
        formData.append('negative_prompt', 'text, watermark, blurry, low quality');
        return formData;
      })()
    }
  );
  const imageBuffer = await response.arrayBuffer();
  return Buffer.from(imageBuffer);
};

Google Cloud Video AI

Google Cloud Video Intelligence API does not generate videos — instead, it analyzes existing videos. In automation pipelines, it is used for quality control: detecting scene changes, transcribing speech, identifying objects, and flagging inappropriate content.

FeatureUse in PipelineAPI Method
Label DetectionVerify video content matches intentannotate (LABEL_DETECTION)
Shot DetectionValidate transitions and scene changesannotate (SHOT_CHANGE_DETECTION)
Speech TranscriptionGenerate captions automaticallyannotate (SPEECH_TRANSCRIPTION)
Explicit Content DetectionQuality gate before publishingannotate (EXPLICIT_CONTENT_DETECTION)
Object TrackingVerify visual consistency across scenesannotate (OBJECT_TRACKING)

API Comparison Table

Choosing the right API for each pipeline stage depends on quality requirements, speed, cost, and rate limits. Here is a comprehensive comparison.

CategoryAPIQualitySpeedCost (Low End)Rate Limit
Text/ScriptOpenAI GPT-4oExcellentFast (2-5s)$0.005/1K tokens500 RPM
Text/ScriptAnthropic ClaudeExcellentFast (2-5s)$0.003/1K tokens50 RPM (varies)
ImagesOpenAI DALL-E 3Very Good10-15s$0.04/image7-15 RPM
ImagesStability AI SD3.5Good5-10s$0.002/image150 RPM
ImagesMidjourney (unofficial)Excellent30-60s$0.01/imageVaries
VideoRunway Gen-3Excellent60-120s$0.05/clip10 concurrent
VideoKling AIVery Good90-180s$0.03/clip3 concurrent
VoiceElevenLabsExcellent2-5s$0.18/1K chars100 RPM
VoiceOpenAI TTSGood1-3s$0.015/1K chars50 RPM
AnalysisGoogle Video AIExcellent30-120s$0.10/minute600 RPM

Rate Limiting and Error Handling

Robust API integration requires handling failures gracefully. The three most common failures are: rate limit errors (429), server errors (500/503), and timeout errors.

Resilient API Caller with Retry Logic
const callAPIWithRetry = async (apiCall, options = {}) => {
  const { maxRetries = 3, baseDelay = 1000, maxDelay = 30000 } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const result = await apiCall();
      return result;
    } catch (error) {
      const isRateLimit = error.status === 429;
      const isServerError = error.status >= 500;
      const isRetryable = isRateLimit || isServerError;

      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      // Use Retry-After header if available, otherwise exponential backoff
      let delay;
      if (isRateLimit && error.headers?.['retry-after']) {
        delay = parseInt(error.headers['retry-after']) * 1000;
      } else {
        delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
      }

      console.warn(
        `API call failed (${error.status}). ` +
        `Retry ${attempt + 1}/${maxRetries} in ${delay}ms...`
      );
      await sleep(delay);
    }
  }
};

// Usage:
const script = await callAPIWithRetry(
  () => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] }),
  { maxRetries: 3, baseDelay: 2000 }
);

Webhook Integrations

Webhooks allow APIs to push notifications to your server when an event occurs, instead of you polling for status updates. This is more efficient and faster for long-running tasks like video generation.

Webhook Receiver for Runway Completion
import express from 'express';
const app = express();

// Webhook endpoint that Runway calls when video generation completes
app.post('/webhooks/runway', express.json(), async (req, res) => {
  const { task_id, status, output } = req.body;

  if (status === 'SUCCEEDED') {
    console.log(`Video ready: ${output[0]}`);
    // Continue pipeline: download video, assemble, publish
    await downloadVideo(output[0], `clips/${task_id}.mp4`);
    await triggerAssemblyStage(task_id);
  } else if (status === 'FAILED') {
    console.error(`Task ${task_id} failed: ${req.body.failure}`);
    await retryOrNotify(task_id);
  }

  res.status(200).json({ received: true });
});

app.listen(3000, () => console.log('Webhook server listening on port 3000'));

// When creating a Runway task, specify the webhook URL:
// { ...params, webhook: 'https://your-server.com/webhooks/runway' }
📝 Note: For webhook-based architectures, use a tool like ngrok during development to expose your local server to the internet. In production, deploy your webhook receiver to a cloud service with HTTPS (AWS Lambda, Vercel, Railway).
Exercise:
Which authentication method requires a token exchange flow with user consent?
Exercise:
Which API is best suited for bulk image generation when cost is the primary concern?
Exercise:
What is the advantage of webhooks over polling for long-running API tasks?