❮ Scheduling & Auto-Publishing Workflow Automation Tools ❯

API Integrations for AI Video

Network diagram showing multiple AI APIs connected through a central orchestration layer

API integrations connect specialized AI services into a unified video production system

Why APIs Are the Foundation of Automation

Every AI tool you use through a web interface has an API (Application Programming Interface) behind it. APIs let your code talk directly to AI services — sending prompts, receiving outputs, and chaining services together without opening a single browser tab.

In an automated video pipeline, APIs replace manual interactions: instead of typing a prompt into ChatGPT's web interface, your code sends a POST request to the OpenAI API. Instead of dragging an image into Runway's upload box, your code sends the image URL to Runway's API endpoint.

📝 Note: API access often requires a separate subscription from the web interface. For example, having a ChatGPT Plus subscription does not give you OpenAI API access — you need to set up an API account at platform.openai.com with separate billing.

Authentication and API Keys

Every API requires authentication — proof that you have permission to use it. The most common method is an API key: a long string of characters you include in each request's headers.

Auth Method	How It Works	Used By
API Key (Bearer Token)	Send key in Authorization header	OpenAI, Runway, Stability AI, ElevenLabs
OAuth 2.0	Token exchange flow, requires user consent	YouTube, Instagram, TikTok, Google services
API Key (Query Param)	Append key to URL as parameter	Some legacy APIs, Google Maps
JWT (JSON Web Token)	Self-signed token with claims	Firebase, custom backends

Secure API Key Management

// NEVER hardcode API keys in your source code.
// Use environment variables loaded from a .env file.

// .env file (add to .gitignore!):
// OPENAI_API_KEY=sk-proj-abc123...
// RUNWAY_API_KEY=rw_key_xyz789...
// ELEVENLABS_API_KEY=el_key_def456...
// STABILITY_API_KEY=sk-stab-ghi012...

// Load in Node.js:
import 'dotenv/config';

const openaiKey = process.env.OPENAI_API_KEY;
const runwayKey = process.env.RUNWAY_API_KEY;

// Use in requests:
const headers = {
  'Authorization': `Bearer ${openaiKey}`,
  'Content-Type': 'application/json'
};

OpenAI API (GPT + DALL-E)

The OpenAI API is typically the first service in any video pipeline. GPT handles script generation, metadata creation, and content planning, while DALL-E handles image generation.

Endpoint	Purpose	Model	Cost (approx)
/v1/chat/completions	Script generation, metadata, planning	gpt-4, gpt-4o, gpt-3.5-turbo	$0.01-$0.03 per 1K tokens
/v1/images/generations	Scene image creation	dall-e-3, dall-e-2	$0.04-$0.08 per image (HD)
/v1/audio/speech	Text-to-speech narration	tts-1, tts-1-hd	$0.015 per 1K chars
/v1/audio/transcriptions	Transcribe audio for captions	whisper-1	$0.006 per minute
/v1/embeddings	Content similarity matching	text-embedding-3-small	$0.00002 per 1K tokens

OpenAI GPT: Script Generation

import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const generateScript = async (topic, duration = 60) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a professional video scriptwriter. Write a ${duration}-second video script. Return a JSON object with: title, total_duration, and scenes array. Each scene has: scene_number, narration, visual_prompt (for AI image generation), duration (seconds), transition (fade/cut/dissolve).`
      },
      {
        role: 'user',
        content: `Write a script about: ${topic}`
      }
    ],
    response_format: { type: 'json_object' },
    temperature: 0.8,
    max_tokens: 2000
  });
  return JSON.parse(response.choices[0].message.content);
};

const script = await generateScript('How Runway Gen-3 is changing video production');

OpenAI DALL-E: Image Generation

const generateSceneImage = async (visualPrompt) => {
  const response = await openai.images.generate({
    model: 'dall-e-3',
    prompt: `${visualPrompt}. Cinematic, 4K, photorealistic, 16:9 aspect ratio, no text or watermarks.`,
    size: '1792x1024',    // Closest to 16:9
    quality: 'hd',         // Higher detail ($0.080 vs $0.040)
    style: 'natural',      // 'vivid' for more dramatic, 'natural' for realistic
    n: 1
  });
  return {
    url: response.data[0].url,
    revised_prompt: response.data[0].revised_prompt
  };
};

Runway API (Image-to-Video & Text-to-Video)

Runway's API provides image-to-video and text-to-video generation — the core of turning static AI images into dynamic video clips. Gen-3 Alpha Turbo is the recommended model for automated pipelines due to its balance of quality and speed.

Runway Gen-3: Image-to-Video

const generateRunwayVideo = async (imageUrl, motionPrompt) => {
  // Step 1: Create generation task
  const createResponse = await fetch('https://api.runwayml.com/v1/image_to_video', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}`,
      'X-Runway-Version': '2024-11-06',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gen3a_turbo',
      promptImage: imageUrl,
      promptText: motionPrompt,
      duration: 5,          // 5 or 10 seconds
      ratio: '16:9',
      watermark: false
    })
  });
  const { id: taskId } = await createResponse.json();

  // Step 2: Poll for completion
  let task;
  do {
    await sleep(10000); // Check every 10s
    const statusResponse = await fetch(
      `https://api.runwayml.com/v1/tasks/${taskId}`,
      { headers: { 'Authorization': `Bearer ${process.env.RUNWAY_API_KEY}` } }
    );
    task = await statusResponse.json();
  } while (task.status === 'RUNNING');

  if (task.status === 'SUCCEEDED') {
    return task.output[0]; // Video URL
  }
  throw new Error(`Runway generation failed: ${task.failure}`);
};

ElevenLabs API (Voice & Sound Effects)

ElevenLabs provides the highest-quality text-to-speech available via API, along with voice cloning and AI sound effects. In video pipelines, it generates narration audio from scripts.

Endpoint	Purpose	Key Parameters
/v1/text-to-speech/{voice_id}	Generate speech from text	voice_id, model_id, text, voice_settings
/v1/text-to-speech/{voice_id}/stream	Stream audio in real-time	Same + output_format
/v1/sound-generation	AI sound effects	text (description of sound), duration_seconds
/v1/voices	List available voices	None (GET request)
/v1/voices/add	Clone a voice from audio samples	name, files (audio samples)

ElevenLabs: Text-to-Speech for Narration

const generateVoiceover = async (text, voiceId = 'pNInz6obpgDQGcFmaJgB') => {
  // voiceId 'pNInz6obpgDQGcFmaJgB' = "Adam" (deep, narrator voice)
  const response = await fetch(
    `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,
    {
      method: 'POST',
      headers: {
        'xi-api-key': process.env.ELEVENLABS_API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: text,
        model_id: 'eleven_multilingual_v2', // Best quality
        voice_settings: {
          stability: 0.5,        // 0-1: lower = more expressive
          similarity_boost: 0.75, // 0-1: higher = closer to original voice
          style: 0.3,            // 0-1: style exaggeration
          use_speaker_boost: true
        },
        output_format: 'mp3_44100_128' // High quality MP3
      })
    }
  );

  // Response is raw audio bytes
  const audioBuffer = await response.arrayBuffer();
  fs.writeFileSync('voiceover.mp3', Buffer.from(audioBuffer));
  return 'voiceover.mp3';
};

ElevenLabs: AI Sound Effects

const generateSoundEffect = async (description, duration = 5) => {
  const response = await fetch(
    'https://api.elevenlabs.io/v1/sound-generation',
    {
      method: 'POST',
      headers: {
        'xi-api-key': process.env.ELEVENLABS_API_KEY,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        text: description,         // e.g., "dramatic cinematic whoosh transition"
        duration_seconds: duration,
        prompt_influence: 0.5      // 0-1: how closely to follow the description
      })
    }
  );
  const audioBuffer = await response.arrayBuffer();
  fs.writeFileSync('sfx.mp3', Buffer.from(audioBuffer));
  return 'sfx.mp3';
};

// Generate transition sound effects
await generateSoundEffect('cinematic whoosh transition sound', 2);
await generateSoundEffect('gentle ambient background music, technology theme', 60);

Stability AI API

Stability AI offers image generation through Stable Diffusion models, image upscaling, inpainting, and outpainting. Their API is significantly cheaper than DALL-E for bulk generation.

Stability AI: Image Generation

const generateStabilityImage = async (prompt) => {
  const response = await fetch(
    'https://api.stability.ai/v2beta/stable-image/generate/sd3',
    {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.STABILITY_API_KEY}`,
        'Accept': 'image/*'
      },
      body: (() => {
        const formData = new FormData();
        formData.append('prompt', prompt);
        formData.append('output_format', 'png');
        formData.append('aspect_ratio', '16:9');
        formData.append('model', 'sd3.5-large');  // Best quality
        formData.append('negative_prompt', 'text, watermark, blurry, low quality');
        return formData;
      })()
    }
  );
  const imageBuffer = await response.arrayBuffer();
  return Buffer.from(imageBuffer);
};

Google Cloud Video AI

Google Cloud Video Intelligence API does not generate videos — instead, it analyzes existing videos. In automation pipelines, it is used for quality control: detecting scene changes, transcribing speech, identifying objects, and flagging inappropriate content.

Feature	Use in Pipeline	API Method
Label Detection	Verify video content matches intent	annotate (LABEL_DETECTION)
Shot Detection	Validate transitions and scene changes	annotate (SHOT_CHANGE_DETECTION)
Speech Transcription	Generate captions automatically	annotate (SPEECH_TRANSCRIPTION)
Explicit Content Detection	Quality gate before publishing	annotate (EXPLICIT_CONTENT_DETECTION)
Object Tracking	Verify visual consistency across scenes	annotate (OBJECT_TRACKING)

API Comparison Table

Choosing the right API for each pipeline stage depends on quality requirements, speed, cost, and rate limits. Here is a comprehensive comparison.

Category	API	Quality	Speed	Cost (Low End)	Rate Limit
Text/Script	OpenAI GPT-4o	Excellent	Fast (2-5s)	$0.005/1K tokens	500 RPM
Text/Script	Anthropic Claude	Excellent	Fast (2-5s)	$0.003/1K tokens	50 RPM (varies)
Images	OpenAI DALL-E 3	Very Good	10-15s	$0.04/image	7-15 RPM
Images	Stability AI SD3.5	Good	5-10s	$0.002/image	150 RPM
Images	Midjourney (unofficial)	Excellent	30-60s	$0.01/image	Varies
Video	Runway Gen-3	Excellent	60-120s	$0.05/clip	10 concurrent
Video	Kling AI	Very Good	90-180s	$0.03/clip	3 concurrent
Voice	ElevenLabs	Excellent	2-5s	$0.18/1K chars	100 RPM
Voice	OpenAI TTS	Good	1-3s	$0.015/1K chars	50 RPM
Analysis	Google Video AI	Excellent	30-120s	$0.10/minute	600 RPM

Rate Limiting and Error Handling

Robust API integration requires handling failures gracefully. The three most common failures are: rate limit errors (429), server errors (500/503), and timeout errors.

Resilient API Caller with Retry Logic

const callAPIWithRetry = async (apiCall, options = {}) => {
  const { maxRetries = 3, baseDelay = 1000, maxDelay = 30000 } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const result = await apiCall();
      return result;
    } catch (error) {
      const isRateLimit = error.status === 429;
      const isServerError = error.status >= 500;
      const isRetryable = isRateLimit || isServerError;

      if (!isRetryable || attempt === maxRetries) {
        throw error;
      }

      // Use Retry-After header if available, otherwise exponential backoff
      let delay;
      if (isRateLimit && error.headers?.['retry-after']) {
        delay = parseInt(error.headers['retry-after']) * 1000;
      } else {
        delay = Math.min(baseDelay * Math.pow(2, attempt), maxDelay);
      }

      console.warn(
        `API call failed (${error.status}). ` +
        `Retry ${attempt + 1}/${maxRetries} in ${delay}ms...`
      );
      await sleep(delay);
    }
  }
};

// Usage:
const script = await callAPIWithRetry(
  () => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] }),
  { maxRetries: 3, baseDelay: 2000 }
);

Webhook Integrations

Webhooks allow APIs to push notifications to your server when an event occurs, instead of you polling for status updates. This is more efficient and faster for long-running tasks like video generation.

Webhook Receiver for Runway Completion

import express from 'express';
const app = express();

// Webhook endpoint that Runway calls when video generation completes
app.post('/webhooks/runway', express.json(), async (req, res) => {
  const { task_id, status, output } = req.body;

  if (status === 'SUCCEEDED') {
    console.log(`Video ready: ${output[0]}`);
    // Continue pipeline: download video, assemble, publish
    await downloadVideo(output[0], `clips/${task_id}.mp4`);
    await triggerAssemblyStage(task_id);
  } else if (status === 'FAILED') {
    console.error(`Task ${task_id} failed: ${req.body.failure}`);
    await retryOrNotify(task_id);
  }

  res.status(200).json({ received: true });
});

app.listen(3000, () => console.log('Webhook server listening on port 3000'));

// When creating a Runway task, specify the webhook URL:
// { ...params, webhook: 'https://your-server.com/webhooks/runway' }

📝 Note: For webhook-based architectures, use a tool like ngrok during development to expose your local server to the internet. In production, deploy your webhook receiver to a cloud service with HTTPS (AWS Lambda, Vercel, Railway).

Exercise:

Which authentication method requires a token exchange flow with user consent?

API Key (Bearer Token)OAuth 2.0JWTAPI Key (Query Param)

Exercise:

Which API is best suited for bulk image generation when cost is the primary concern?

OpenAI DALL-E 3 ($0.04/image)Stability AI SD3.5 ($0.002/image)Midjourney ($0.01/image)ElevenLabs (voice only)

Exercise:

What is the advantage of webhooks over polling for long-running API tasks?

Webhooks are cheaper per API callWebhooks push notifications when complete instead of requiring repeated status checksWebhooks generate higher quality outputsWebhooks bypass rate limits

❮ Scheduling & Auto-Publishing Workflow Automation Tools ❯