Batch Processing for AI Video
What Is Batch Processing?
Batch processing is the technique of submitting multiple generation requests at once instead of processing them one at a time. Rather than generating images, video clips, or voiceovers sequentially, you queue up all requests and let them run in parallel or in optimized batches.
In the context of AI video production, batch processing applies to every stage of the pipeline: generating 50 scripts at once, creating 200 images in a single run, producing voiceovers for an entire week of content, or rendering 30 video clips simultaneously.
Batch Image Generation
Images are typically the easiest asset to batch-generate because image APIs are fast and relatively inexpensive. The key challenge is maintaining visual consistency across a large batch.
| Tool | Batch Method | Max Batch Size | Avg Time Per Image | Cost Per Image |
|---|---|---|---|---|
| DALL-E 3 API | Loop API calls with async/await | Unlimited (rate limited) | 10-15 seconds | $0.04 - $0.08 |
| Midjourney API | /imagine batch with --repeat flag | Up to 40 per batch | 30-60 seconds | $0.01 - $0.03 |
| Stability AI API | POST /v1/generation with batch param | 10 per request | 5-10 seconds | $0.002 - $0.006 |
| Leonardo AI API | Batch generation endpoint | 8 per request | 8-15 seconds | $0.01 - $0.02 |
| Flux (Replicate) | Prediction batch endpoint | Unlimited (queued) | 5-15 seconds | $0.003 - $0.01 |
const generateBatchImages = async (prompts, concurrency = 5) => {
const results = [];
// Process in chunks to respect rate limits
for (let i = 0; i < prompts.length; i += concurrency) {
const chunk = prompts.slice(i, i + concurrency);
const promises = chunk.map(prompt =>
openai.images.generate({
model: 'dall-e-3',
prompt: prompt,
size: '1792x1024',
quality: 'hd',
n: 1
})
);
const chunkResults = await Promise.allSettled(promises);
results.push(...chunkResults);
// Respect rate limits: pause between chunks
if (i + concurrency < prompts.length) {
await sleep(2000);
}
}
return results;
};
// Generate 50 images in batches of 5
const allPrompts = scenes.map(s => s.visual_prompt);
const images = await generateBatchImages(allPrompts, 5);Batch Video Generation
Video generation is the most time-consuming and expensive stage. Batch video generation requires careful orchestration because each clip can take 1-5 minutes to generate, and API rate limits are stricter than for images.
| Tool | Batch Method | Max Concurrent | Avg Time Per Clip | Cost Per 5s Clip |
|---|---|---|---|---|
| Runway Gen-3 | Async task submission + polling | 5 concurrent | 60-120 seconds | $0.05 - $0.10 |
| Kling AI API | Batch task queue | 3 concurrent | 90-180 seconds | $0.03 - $0.08 |
| Pika API | Sequential with webhook callbacks | 2 concurrent | 60-90 seconds | $0.04 - $0.07 |
| Luma Dream Machine | Async generation with status polling | 3 concurrent | 45-90 seconds | $0.03 - $0.06 |
| Haiper API | Batch submission endpoint | 5 concurrent | 30-60 seconds | $0.02 - $0.05 |
const batchGenerateVideos = async (imageUrls, prompts) => {
// Step 1: Submit all generation tasks
const tasks = [];
for (let i = 0; i < imageUrls.length; i++) {
const task = await submitVideoTask(imageUrls[i], prompts[i]);
tasks.push({ id: task.id, scene: i, status: 'processing' });
await sleep(1000); // Stagger submissions
}
// Step 2: Poll for completion
const completed = [];
while (completed.length < tasks.length) {
for (const task of tasks) {
if (task.status === 'processing') {
const status = await checkTaskStatus(task.id);
if (status.state === 'completed') {
task.status = 'completed';
task.videoUrl = status.output_url;
completed.push(task);
console.log(`Scene ${task.scene} complete (${completed.length}/${tasks.length})`);
} else if (status.state === 'failed') {
task.status = 'failed';
console.error(`Scene ${task.scene} failed: ${status.error}`);
// Resubmit failed task
const retry = await submitVideoTask(imageUrls[task.scene], prompts[task.scene]);
task.id = retry.id;
task.status = 'processing';
}
}
}
await sleep(10000); // Poll every 10 seconds
}
return completed;
};Parallel Processing Strategies
There are three main strategies for processing batches, each with different tradeoffs:
1. Sequential: Process one item at a time. Slowest but simplest, uses minimal API quota. Best when rate limits are very strict or you need to use the output of one item as input for the next.
2. Parallel: Process all items simultaneously. Fastest but can hit rate limits quickly and costs more due to burst pricing. Best for small batches with generous rate limits.
3. Pipelined: Start the next item as soon as the previous one moves to the next stage. For example, while Scene 3 images are generating, Scene 2 is already in video generation, and Scene 1 is already in voiceover. This is the most efficient approach for end-to-end pipelines.
| Strategy | Speed | API Usage | Complexity | Best For |
|---|---|---|---|---|
| Sequential | Slow | Minimal | Low | Strict rate limits, dependent tasks |
| Parallel | Fast | Burst (high) | Medium | Small batches, generous quotas |
| Pipelined | Optimized | Steady | High | Full pipeline runs, production systems |
| Chunked Parallel | Balanced | Controlled | Medium | Large batches with moderate rate limits |
Managing API Rate Limits
Every AI API enforces rate limits — restrictions on how many requests you can make per minute, per hour, or per day. Batch processing must respect these limits or your requests will be rejected with 429 Too Many Requests errors.
| API | Requests/Min (RPM) | Tokens/Min (TPM) | Images/Min | Strategy |
|---|---|---|---|---|
| OpenAI GPT-4 | 500 RPM (Tier 3) | 80,000 TPM | N/A | Token-aware batching |
| OpenAI DALL-E 3 | 7 RPM (Tier 1) / 15 RPM (Tier 3) | N/A | 7/15 per min | Staggered with delays |
| Runway | ~10 concurrent tasks | N/A | N/A | Queue-based polling |
| ElevenLabs | 100 RPM (Starter) | N/A | N/A | Chunk by character count |
| Stability AI | 150 RPM | N/A | 150 per min | Generous — parallel safe |
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.requests = [];
}
async waitForSlot() {
const now = Date.now();
// Remove expired timestamps
this.requests = this.requests.filter(t => now - t < this.windowMs);
if (this.requests.length >= this.maxRequests) {
const oldestRequest = this.requests[0];
const waitTime = this.windowMs - (now - oldestRequest) + 100;
console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
await sleep(waitTime);
}
this.requests.push(Date.now());
}
}
// Usage: Max 7 DALL-E requests per 60 seconds
const dalleRateLimiter = new RateLimiter(7, 60000);
for (const prompt of imagePrompts) {
await dalleRateLimiter.waitForSlot();
const image = await generateImage(prompt);
results.push(image);
}Cost Optimization for Bulk Generation
When processing hundreds of items, small cost differences per item compound quickly. A $0.03 difference per image across 1,000 images is $30. Optimizing costs involves choosing the right model tier, using batch APIs, and caching results.
| Optimization | Savings | How It Works |
|---|---|---|
| OpenAI Batch API | 50% off standard pricing | Submit batch files, results within 24 hours |
| Lower quality tiers | 30-60% off | Use 'standard' instead of 'hd' for non-hero images |
| Caching duplicates | 100% for cached items | Store and reuse identical prompt results |
| Off-peak generation | Varies | Some APIs have lower costs during off-peak hours |
| Smaller models | 40-70% off | Use GPT-3.5 for simple tasks instead of GPT-4 |
| Batch voiceover | 20-30% off | Send longer text in fewer API calls |
// Step 1: Create a JSONL batch file
const batchRequests = topics.map((topic, i) => ({
custom_id: `script-${i}`,
method: 'POST',
url: '/v1/chat/completions',
body: {
model: 'gpt-4',
messages: [
{ role: 'system', content: 'Write a 60-second video script...' },
{ role: 'user', content: `Topic: ${topic}` }
]
}
}));
// Step 2: Upload batch file
const file = await openai.files.create({
file: createBatchFile(batchRequests),
purpose: 'batch'
});
// Step 3: Create batch job (50% cheaper than real-time)
const batch = await openai.batches.create({
input_file_id: file.id,
endpoint: '/v1/chat/completions',
completion_window: '24h'
});
// Step 4: Poll for results
// Results are available within 24 hours at half the costQuality Control at Scale
Generating at scale means more opportunities for errors. Quality control must be automated and systematic — you cannot manually review every image when you are generating 500 per day.
Automated quality checks include:
- Image resolution validation: Confirm output dimensions match expected aspect ratio
- Content safety filtering: Run outputs through content moderation APIs to catch inappropriate content
- Similarity scoring: Compare generated images against a style reference using CLIP embeddings to ensure visual consistency
- Audio quality checks: Validate voiceover duration matches script timing, check for silence gaps or clipping
- Video integrity: Verify clip duration, file size, codec compatibility before assembly
const qualityCheck = async (asset, type) => {
const checks = {
image: async (img) => {
const metadata = await sharp(img).metadata();
return {
passed: metadata.width >= 1792 && metadata.height >= 1024,
reason: metadata.width < 1792 ? 'Resolution too low' : 'OK',
dimensions: `${metadata.width}x${metadata.height}`
};
},
video: async (vid) => {
const probe = await ffprobe(vid);
const duration = parseFloat(probe.streams[0].duration);
return {
passed: duration >= 4.5 && duration <= 10.5,
reason: duration < 4.5 ? 'Too short' : duration > 10.5 ? 'Too long' : 'OK',
duration: `${duration}s`
};
},
audio: async (aud) => {
const probe = await ffprobe(aud);
const duration = parseFloat(probe.streams[0].duration);
const bitrate = parseInt(probe.streams[0].bit_rate);
return {
passed: bitrate >= 128000,
reason: bitrate < 128000 ? 'Bitrate too low' : 'OK',
duration: `${duration}s`
};
}
};
return checks[type](asset);
};