← Course Outline

Batch Processing for AI Video

Batch processing diagram showing multiple videos being generated simultaneously
Batch processing generates multiple pieces of content in parallel, dramatically increasing throughput

What Is Batch Processing?

Batch processing is the technique of submitting multiple generation requests at once instead of processing them one at a time. Rather than generating images, video clips, or voiceovers sequentially, you queue up all requests and let them run in parallel or in optimized batches.

In the context of AI video production, batch processing applies to every stage of the pipeline: generating 50 scripts at once, creating 200 images in a single run, producing voiceovers for an entire week of content, or rendering 30 video clips simultaneously.

📝 Note: Batch processing is not just about speed — it is also about cost. Many AI APIs offer significant discounts for batch requests (OpenAI's Batch API charges 50% less than real-time requests).

Batch Image Generation

Images are typically the easiest asset to batch-generate because image APIs are fast and relatively inexpensive. The key challenge is maintaining visual consistency across a large batch.

ToolBatch MethodMax Batch SizeAvg Time Per ImageCost Per Image
DALL-E 3 APILoop API calls with async/awaitUnlimited (rate limited)10-15 seconds$0.04 - $0.08
Midjourney API/imagine batch with --repeat flagUp to 40 per batch30-60 seconds$0.01 - $0.03
Stability AI APIPOST /v1/generation with batch param10 per request5-10 seconds$0.002 - $0.006
Leonardo AI APIBatch generation endpoint8 per request8-15 seconds$0.01 - $0.02
Flux (Replicate)Prediction batch endpointUnlimited (queued)5-15 seconds$0.003 - $0.01
Batch Image Generation with DALL-E (Parallel)
const generateBatchImages = async (prompts, concurrency = 5) => {
  const results = [];
  // Process in chunks to respect rate limits
  for (let i = 0; i < prompts.length; i += concurrency) {
    const chunk = prompts.slice(i, i + concurrency);
    const promises = chunk.map(prompt =>
      openai.images.generate({
        model: 'dall-e-3',
        prompt: prompt,
        size: '1792x1024',
        quality: 'hd',
        n: 1
      })
    );
    const chunkResults = await Promise.allSettled(promises);
    results.push(...chunkResults);
    // Respect rate limits: pause between chunks
    if (i + concurrency < prompts.length) {
      await sleep(2000);
    }
  }
  return results;
};

// Generate 50 images in batches of 5
const allPrompts = scenes.map(s => s.visual_prompt);
const images = await generateBatchImages(allPrompts, 5);

Batch Video Generation

Video generation is the most time-consuming and expensive stage. Batch video generation requires careful orchestration because each clip can take 1-5 minutes to generate, and API rate limits are stricter than for images.

ToolBatch MethodMax ConcurrentAvg Time Per ClipCost Per 5s Clip
Runway Gen-3Async task submission + polling5 concurrent60-120 seconds$0.05 - $0.10
Kling AI APIBatch task queue3 concurrent90-180 seconds$0.03 - $0.08
Pika APISequential with webhook callbacks2 concurrent60-90 seconds$0.04 - $0.07
Luma Dream MachineAsync generation with status polling3 concurrent45-90 seconds$0.03 - $0.06
Haiper APIBatch submission endpoint5 concurrent30-60 seconds$0.02 - $0.05
Batch Video Generation with Polling
const batchGenerateVideos = async (imageUrls, prompts) => {
  // Step 1: Submit all generation tasks
  const tasks = [];
  for (let i = 0; i < imageUrls.length; i++) {
    const task = await submitVideoTask(imageUrls[i], prompts[i]);
    tasks.push({ id: task.id, scene: i, status: 'processing' });
    await sleep(1000); // Stagger submissions
  }

  // Step 2: Poll for completion
  const completed = [];
  while (completed.length < tasks.length) {
    for (const task of tasks) {
      if (task.status === 'processing') {
        const status = await checkTaskStatus(task.id);
        if (status.state === 'completed') {
          task.status = 'completed';
          task.videoUrl = status.output_url;
          completed.push(task);
          console.log(`Scene ${task.scene} complete (${completed.length}/${tasks.length})`);
        } else if (status.state === 'failed') {
          task.status = 'failed';
          console.error(`Scene ${task.scene} failed: ${status.error}`);
          // Resubmit failed task
          const retry = await submitVideoTask(imageUrls[task.scene], prompts[task.scene]);
          task.id = retry.id;
          task.status = 'processing';
        }
      }
    }
    await sleep(10000); // Poll every 10 seconds
  }
  return completed;
};

Parallel Processing Strategies

Diagram showing sequential vs parallel vs pipelined processing strategies
Three approaches to batch processing: sequential (slow), parallel (fast but expensive), and pipelined (balanced)

There are three main strategies for processing batches, each with different tradeoffs:

1. Sequential: Process one item at a time. Slowest but simplest, uses minimal API quota. Best when rate limits are very strict or you need to use the output of one item as input for the next.

2. Parallel: Process all items simultaneously. Fastest but can hit rate limits quickly and costs more due to burst pricing. Best for small batches with generous rate limits.

3. Pipelined: Start the next item as soon as the previous one moves to the next stage. For example, while Scene 3 images are generating, Scene 2 is already in video generation, and Scene 1 is already in voiceover. This is the most efficient approach for end-to-end pipelines.

StrategySpeedAPI UsageComplexityBest For
SequentialSlowMinimalLowStrict rate limits, dependent tasks
ParallelFastBurst (high)MediumSmall batches, generous quotas
PipelinedOptimizedSteadyHighFull pipeline runs, production systems
Chunked ParallelBalancedControlledMediumLarge batches with moderate rate limits

Managing API Rate Limits

Every AI API enforces rate limits — restrictions on how many requests you can make per minute, per hour, or per day. Batch processing must respect these limits or your requests will be rejected with 429 Too Many Requests errors.

APIRequests/Min (RPM)Tokens/Min (TPM)Images/MinStrategy
OpenAI GPT-4500 RPM (Tier 3)80,000 TPMN/AToken-aware batching
OpenAI DALL-E 37 RPM (Tier 1) / 15 RPM (Tier 3)N/A7/15 per minStaggered with delays
Runway~10 concurrent tasksN/AN/AQueue-based polling
ElevenLabs100 RPM (Starter)N/AN/AChunk by character count
Stability AI150 RPMN/A150 per minGenerous — parallel safe
Rate Limiter Implementation
class RateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.requests = [];
  }

  async waitForSlot() {
    const now = Date.now();
    // Remove expired timestamps
    this.requests = this.requests.filter(t => now - t < this.windowMs);
    if (this.requests.length >= this.maxRequests) {
      const oldestRequest = this.requests[0];
      const waitTime = this.windowMs - (now - oldestRequest) + 100;
      console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
      await sleep(waitTime);
    }
    this.requests.push(Date.now());
  }
}

// Usage: Max 7 DALL-E requests per 60 seconds
const dalleRateLimiter = new RateLimiter(7, 60000);

for (const prompt of imagePrompts) {
  await dalleRateLimiter.waitForSlot();
  const image = await generateImage(prompt);
  results.push(image);
}

Cost Optimization for Bulk Generation

When processing hundreds of items, small cost differences per item compound quickly. A $0.03 difference per image across 1,000 images is $30. Optimizing costs involves choosing the right model tier, using batch APIs, and caching results.

OptimizationSavingsHow It Works
OpenAI Batch API50% off standard pricingSubmit batch files, results within 24 hours
Lower quality tiers30-60% offUse 'standard' instead of 'hd' for non-hero images
Caching duplicates100% for cached itemsStore and reuse identical prompt results
Off-peak generationVariesSome APIs have lower costs during off-peak hours
Smaller models40-70% offUse GPT-3.5 for simple tasks instead of GPT-4
Batch voiceover20-30% offSend longer text in fewer API calls
OpenAI Batch API Usage
// Step 1: Create a JSONL batch file
const batchRequests = topics.map((topic, i) => ({
  custom_id: `script-${i}`,
  method: 'POST',
  url: '/v1/chat/completions',
  body: {
    model: 'gpt-4',
    messages: [
      { role: 'system', content: 'Write a 60-second video script...' },
      { role: 'user', content: `Topic: ${topic}` }
    ]
  }
}));

// Step 2: Upload batch file
const file = await openai.files.create({
  file: createBatchFile(batchRequests),
  purpose: 'batch'
});

// Step 3: Create batch job (50% cheaper than real-time)
const batch = await openai.batches.create({
  input_file_id: file.id,
  endpoint: '/v1/chat/completions',
  completion_window: '24h'
});

// Step 4: Poll for results
// Results are available within 24 hours at half the cost

Quality Control at Scale

Generating at scale means more opportunities for errors. Quality control must be automated and systematic — you cannot manually review every image when you are generating 500 per day.

Automated quality checks include:

- Image resolution validation: Confirm output dimensions match expected aspect ratio

- Content safety filtering: Run outputs through content moderation APIs to catch inappropriate content

- Similarity scoring: Compare generated images against a style reference using CLIP embeddings to ensure visual consistency

- Audio quality checks: Validate voiceover duration matches script timing, check for silence gaps or clipping

- Video integrity: Verify clip duration, file size, codec compatibility before assembly

Automated Quality Check Function
const qualityCheck = async (asset, type) => {
  const checks = {
    image: async (img) => {
      const metadata = await sharp(img).metadata();
      return {
        passed: metadata.width >= 1792 && metadata.height >= 1024,
        reason: metadata.width < 1792 ? 'Resolution too low' : 'OK',
        dimensions: `${metadata.width}x${metadata.height}`
      };
    },
    video: async (vid) => {
      const probe = await ffprobe(vid);
      const duration = parseFloat(probe.streams[0].duration);
      return {
        passed: duration >= 4.5 && duration <= 10.5,
        reason: duration < 4.5 ? 'Too short' : duration > 10.5 ? 'Too long' : 'OK',
        duration: `${duration}s`
      };
    },
    audio: async (aud) => {
      const probe = await ffprobe(aud);
      const duration = parseFloat(probe.streams[0].duration);
      const bitrate = parseInt(probe.streams[0].bit_rate);
      return {
        passed: bitrate >= 128000,
        reason: bitrate < 128000 ? 'Bitrate too low' : 'OK',
        duration: `${duration}s`
      };
    }
  };
  return checks[type](asset);
};
📝 Note: Set up a dashboard (Grafana, a simple web page, or even a Google Sheet) that tracks batch job metrics: success rate, average generation time, cost per item, and quality check pass rate. This visibility is essential for optimizing at scale.
Exercise:
What is the primary advantage of using OpenAI's Batch API over real-time API calls?
Exercise:
Which processing strategy starts the next pipeline stage for an item as soon as the current stage completes, even while other items are still processing?
Exercise:
What HTTP status code indicates you have exceeded an API's rate limit?