Batch AI Processing Cost — 1M Documents/Month with Batch API Discounts
Processing 1M documents per month via batch APIs runs $400–$3,200 — half the on-demand price.
Scenario
An offline pipeline processes 1M documents per month — classifying, summarizing, or extracting structured data. The work is not latency-sensitive (overnight batch is fine) so we can use the batch APIs offered by OpenAI, Anthropic, and others for 50% off. Each document is ~2,000 input tokens and the model produces ~200 tokens of output.
| Assumption | Value |
|---|---|
| Documents / month | 1,000,000 |
| Input / doc | ~2,000 tokens |
| Output / doc | ~200 tokens |
| Cache hit | 5% (each doc unique) |
| Pricing | Batch API (50% discount where available) |
Note: the cost numbers below show ON-DEMAND pricing — apply 50% discount manually for batch API on OpenAI / Anthropic / Google. The calculator does not yet auto-apply batch pricing.
Monthly cost across recommended models
Calculated at 2000M input tokens + 200.0M output tokens, with 5% prompt cache hit rate.
| Model | Input cost | Output cost | Cache savings | Total / mo |
|---|---|---|---|---|
| Deepseek Chat Cheapest | $560 | $84.00 | −$25.20 | $619 |
| Gpt 5 Mini | $500 | $400 | −$22.50 | $878 |
| Gemini 2.5 Flash | $600 | $500 | −$27.00 | $1073 |
| Claude Haiku 4 5 | $2000 | $1000 | −$90.00 | $2910 |
💡 Switching from Claude Haiku 4 5 to Deepseek Chat saves $2291/month (79% reduction).
Why these models
Batch processing rewards the cheapest reliable model. Quality matters less per doc since you can re-process failures. GPT-5 Mini and Claude Haiku 4.5 both offer 50% off via batch API. DeepSeek is the absolute cheapest but lacks a formal batch discount (their on-demand price is already lower than competitors batch prices). Gemini Flash with batch API is competitive too.
Key insights
- 1. Use batch APIs whenever you can — 50% discount is the easiest win in LLM cost optimization.
- 2. For 1M+ docs/month, even a 10% prompt template optimization saves hundreds of dollars. A/B test prompts on small batches first.
- 3. Failures matter: if 2% of docs fail and need retry on a more expensive model, the effective cost rises. Track failure rates per model.
- 4. Self-hosting an open model on GPU rentals (Modal, Runpod) can beat batch APIs at 10M+ docs/month — worth evaluating at high volume.
Cost at different scales
| Scale | Deepseek Chat | Gpt 5 Mini | Gemini 2.5 Flash | Claude Haiku 4 5 |
|---|---|---|---|---|
| 100k docs (smoke test) | $61.88 | $87.75 | $107 | $291 |
| Baseline (1M docs) | $619 | $878 | $1073 | $2910 |
| 10M docs (production) | $6188 | $8775 | $10.7k | $29.1k |
| 100M docs (enterprise pipeline) | $61.9k | $87.8k | $107.3k | $291.0k |
Try your own scenario
The numbers above use our best-guess assumptions. For your actual workflow, use the interactive calculator to plug in your real token volumes and quality requirements.
All cost figures are estimates based on publicly-listed pricing as of the data refresh date. Verify with the provider's official pricing page before making business decisions. Embedding costs, vector database costs, and infrastructure costs are not included unless explicitly noted.