❮ Automated Video Pipelines Scheduling & Auto-Publishing ❯

Batch Processing for AI Video

Batch processing diagram showing multiple videos being generated simultaneously

Batch processing generates multiple pieces of content in parallel, dramatically increasing throughput

What Is Batch Processing?

Batch processing is the technique of submitting multiple generation requests at once instead of processing them one at a time. Rather than generating images, video clips, or voiceovers sequentially, you queue up all requests and let them run in parallel or in optimized batches.

In the context of AI video production, batch processing applies to every stage of the pipeline: generating 50 scripts at once, creating 200 images in a single run, producing voiceovers for an entire week of content, or rendering 30 video clips simultaneously.

📝 Note: Batch processing is not just about speed — it is also about cost. Many AI APIs offer significant discounts for batch requests (OpenAI's Batch API charges 50% less than real-time requests).

Batch Image Generation

Images are typically the easiest asset to batch-generate because image APIs are fast and relatively inexpensive. The key challenge is maintaining visual consistency across a large batch.

Tool	Batch Method	Max Batch Size	Avg Time Per Image	Cost Per Image
DALL-E 3 API	Loop API calls with async/await	Unlimited (rate limited)	10-15 seconds	$0.04 - $0.08
Midjourney API	/imagine batch with --repeat flag	Up to 40 per batch	30-60 seconds	$0.01 - $0.03
Stability AI API	POST /v1/generation with batch param	10 per request	5-10 seconds	$0.002 - $0.006
Leonardo AI API	Batch generation endpoint	8 per request	8-15 seconds	$0.01 - $0.02
Flux (Replicate)	Prediction batch endpoint	Unlimited (queued)	5-15 seconds	$0.003 - $0.01

Batch Image Generation with DALL-E (Parallel)

const generateBatchImages = async (prompts, concurrency = 5) => {
  const results = [];
  // Process in chunks to respect rate limits
  for (let i = 0; i < prompts.length; i += concurrency) {
    const chunk = prompts.slice(i, i + concurrency);
    const promises = chunk.map(prompt =>
      openai.images.generate({
        model: 'dall-e-3',
        prompt: prompt,
        size: '1792x1024',
        quality: 'hd',
        n: 1
      })
    );
    const chunkResults = await Promise.allSettled(promises);
    results.push(...chunkResults);
    // Respect rate limits: pause between chunks
    if (i + concurrency < prompts.length) {
      await sleep(2000);
    }
  }
  return results;
};

// Generate 50 images in batches of 5
const allPrompts = scenes.map(s => s.visual_prompt);
const images = await generateBatchImages(allPrompts, 5);

Batch Video Generation

Video generation is the most time-consuming and expensive stage. Batch video generation requires careful orchestration because each clip can take 1-5 minutes to generate, and API rate limits are stricter than for images.

Tool	Batch Method	Max Concurrent	Avg Time Per Clip	Cost Per 5s Clip
Runway Gen-3	Async task submission + polling	5 concurrent	60-120 seconds	$0.05 - $0.10
Kling AI API	Batch task queue	3 concurrent	90-180 seconds	$0.03 - $0.08
Pika API	Sequential with webhook callbacks	2 concurrent	60-90 seconds	$0.04 - $0.07
Luma Dream Machine	Async generation with status polling	3 concurrent	45-90 seconds	$0.03 - $0.06
Haiper API	Batch submission endpoint	5 concurrent	30-60 seconds	$0.02 - $0.05

Batch Video Generation with Polling

const batchGenerateVideos = async (imageUrls, prompts) => {
  // Step 1: Submit all generation tasks
  const tasks = [];
  for (let i = 0; i < imageUrls.length; i++) {
    const task = await submitVideoTask(imageUrls[i], prompts[i]);
    tasks.push({ id: task.id, scene: i, status: 'processing' });
    await sleep(1000); // Stagger submissions
  }

  // Step 2: Poll for completion
  const completed = [];
  while (completed.length < tasks.length) {
    for (const task of tasks) {
      if (task.status === 'processing') {
        const status = await checkTaskStatus(task.id);
        if (status.state === 'completed') {
          task.status = 'completed';
          task.videoUrl = status.output_url;
          completed.push(task);
          console.log(`Scene ${task.scene} complete (${completed.length}/${tasks.length})`);
        } else if (status.state === 'failed') {
          task.status = 'failed';
          console.error(`Scene ${task.scene} failed: ${status.error}`);
          // Resubmit failed task
          const retry = await submitVideoTask(imageUrls[task.scene], prompts[task.scene]);
          task.id = retry.id;
          task.status = 'processing';
        }
      }
    }
    await sleep(10000); // Poll every 10 seconds
  }
  return completed;
};

Parallel Processing Strategies

Diagram showing sequential vs parallel vs pipelined processing strategies

Three approaches to batch processing: sequential (slow), parallel (fast but expensive), and pipelined (balanced)

There are three main strategies for processing batches, each with different tradeoffs:

1. Sequential: Process one item at a time. Slowest but simplest, uses minimal API quota. Best when rate limits are very strict or you need to use the output of one item as input for the next.

2. Parallel: Process all items simultaneously. Fastest but can hit rate limits quickly and costs more due to burst pricing. Best for small batches with generous rate limits.

3. Pipelined: Start the next item as soon as the previous one moves to the next stage. For example, while Scene 3 images are generating, Scene 2 is already in video generation, and Scene 1 is already in voiceover. This is the most efficient approach for end-to-end pipelines.

Strategy	Speed	API Usage	Complexity	Best For
Sequential	Slow	Minimal	Low	Strict rate limits, dependent tasks
Parallel	Fast	Burst (high)	Medium	Small batches, generous quotas
Pipelined	Optimized	Steady	High	Full pipeline runs, production systems
Chunked Parallel	Balanced	Controlled	Medium	Large batches with moderate rate limits

Managing API Rate Limits

Every AI API enforces rate limits — restrictions on how many requests you can make per minute, per hour, or per day. Batch processing must respect these limits or your requests will be rejected with 429 Too Many Requests errors.

API	Requests/Min (RPM)	Tokens/Min (TPM)	Images/Min	Strategy
OpenAI GPT-4	500 RPM (Tier 3)	80,000 TPM	N/A	Token-aware batching
OpenAI DALL-E 3	7 RPM (Tier 1) / 15 RPM (Tier 3)	N/A	7/15 per min	Staggered with delays
Runway	~10 concurrent tasks	N/A	N/A	Queue-based polling
ElevenLabs	100 RPM (Starter)	N/A	N/A	Chunk by character count
Stability AI	150 RPM	N/A	150 per min	Generous — parallel safe

Rate Limiter Implementation

class RateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.requests = [];
  }

  async waitForSlot() {
    const now = Date.now();
    // Remove expired timestamps
    this.requests = this.requests.filter(t => now - t < this.windowMs);
    if (this.requests.length >= this.maxRequests) {
      const oldestRequest = this.requests[0];
      const waitTime = this.windowMs - (now - oldestRequest) + 100;
      console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
      await sleep(waitTime);
    }
    this.requests.push(Date.now());
  }
}

// Usage: Max 7 DALL-E requests per 60 seconds
const dalleRateLimiter = new RateLimiter(7, 60000);

for (const prompt of imagePrompts) {
  await dalleRateLimiter.waitForSlot();
  const image = await generateImage(prompt);
  results.push(image);
}

Cost Optimization for Bulk Generation

When processing hundreds of items, small cost differences per item compound quickly. A $0.03 difference per image across 1,000 images is $30. Optimizing costs involves choosing the right model tier, using batch APIs, and caching results.

Optimization	Savings	How It Works
OpenAI Batch API	50% off standard pricing	Submit batch files, results within 24 hours
Lower quality tiers	30-60% off	Use 'standard' instead of 'hd' for non-hero images
Caching duplicates	100% for cached items	Store and reuse identical prompt results
Off-peak generation	Varies	Some APIs have lower costs during off-peak hours
Smaller models	40-70% off	Use GPT-3.5 for simple tasks instead of GPT-4
Batch voiceover	20-30% off	Send longer text in fewer API calls

OpenAI Batch API Usage

// Step 1: Create a JSONL batch file
const batchRequests = topics.map((topic, i) => ({
  custom_id: `script-${i}`,
  method: 'POST',
  url: '/v1/chat/completions',
  body: {
    model: 'gpt-4',
    messages: [
      { role: 'system', content: 'Write a 60-second video script...' },
      { role: 'user', content: `Topic: ${topic}` }
    ]
  }
}));

// Step 2: Upload batch file
const file = await openai.files.create({
  file: createBatchFile(batchRequests),
  purpose: 'batch'
});

// Step 3: Create batch job (50% cheaper than real-time)
const batch = await openai.batches.create({
  input_file_id: file.id,
  endpoint: '/v1/chat/completions',
  completion_window: '24h'
});

// Step 4: Poll for results
// Results are available within 24 hours at half the cost

Quality Control at Scale

Generating at scale means more opportunities for errors. Quality control must be automated and systematic — you cannot manually review every image when you are generating 500 per day.

Automated quality checks include:

- Image resolution validation: Confirm output dimensions match expected aspect ratio

- Content safety filtering: Run outputs through content moderation APIs to catch inappropriate content

- Similarity scoring: Compare generated images against a style reference using CLIP embeddings to ensure visual consistency

- Audio quality checks: Validate voiceover duration matches script timing, check for silence gaps or clipping

- Video integrity: Verify clip duration, file size, codec compatibility before assembly

Automated Quality Check Function

const qualityCheck = async (asset, type) => {
  const checks = {
    image: async (img) => {
      const metadata = await sharp(img).metadata();
      return {
        passed: metadata.width >= 1792 && metadata.height >= 1024,
        reason: metadata.width < 1792 ? 'Resolution too low' : 'OK',
        dimensions: `${metadata.width}x${metadata.height}`
      };
    },
    video: async (vid) => {
      const probe = await ffprobe(vid);
      const duration = parseFloat(probe.streams[0].duration);
      return {
        passed: duration >= 4.5 && duration <= 10.5,
        reason: duration < 4.5 ? 'Too short' : duration > 10.5 ? 'Too long' : 'OK',
        duration: `${duration}s`
      };
    },
    audio: async (aud) => {
      const probe = await ffprobe(aud);
      const duration = parseFloat(probe.streams[0].duration);
      const bitrate = parseInt(probe.streams[0].bit_rate);
      return {
        passed: bitrate >= 128000,
        reason: bitrate < 128000 ? 'Bitrate too low' : 'OK',
        duration: `${duration}s`
      };
    }
  };
  return checks[type](asset);
};

📝 Note: Set up a dashboard (Grafana, a simple web page, or even a Google Sheet) that tracks batch job metrics: success rate, average generation time, cost per item, and quality check pass rate. This visibility is essential for optimizing at scale.

Exercise:

What is the primary advantage of using OpenAI's Batch API over real-time API calls?

Faster response times50% cost reductionHigher quality outputsNo rate limits

Exercise:

Which processing strategy starts the next pipeline stage for an item as soon as the current stage completes, even while other items are still processing?

Sequential processingParallel processingPipelined processingRandom processing

Exercise:

What HTTP status code indicates you have exceeded an API's rate limit?

400 Bad Request403 Forbidden429 Too Many Requests500 Internal Server Error

❮ Automated Video Pipelines Scheduling & Auto-Publishing ❯