📦

Batch AI Processing Cost — 1M Documents/Month with Batch API Discounts

Processing 1M documents per month via batch APIs runs $400–$3,200 — half the on-demand price.

Cost range: $400–$3,200/mo

Scenario

An offline pipeline processes 1M documents per month — classifying, summarizing, or extracting structured data. The work is not latency-sensitive (overnight batch is fine) so we can use the batch APIs offered by OpenAI, Anthropic, and others for 50% off. Each document is ~2,000 input tokens and the model produces ~200 tokens of output.

Assumption	Value
Documents / month	1,000,000
Input / doc	~2,000 tokens
Output / doc	~200 tokens
Cache hit	5% (each doc unique)
Pricing	Batch API (50% discount where available)

Note: the cost numbers below show ON-DEMAND pricing — apply 50% discount manually for batch API on OpenAI / Anthropic / Google. The calculator does not yet auto-apply batch pricing.

Monthly cost across recommended models

Calculated at 2000M input tokens + 200.0M output tokens, with 5% prompt cache hit rate.

Model	Input cost	Output cost	Cache savings	Total / mo
Deepseek Chat Cheapest	$560	$84.00	−$25.20	$619
Gpt 5 Mini	$500	$400	−$22.50	$878
Gemini 2.5 Flash	$600	$500	−$27.00	$1073
Claude Haiku 4 5	$2000	$1000	−$90.00	$2910

💡 Switching from Claude Haiku 4 5 to Deepseek Chat saves $2291/month (79% reduction).

Why these models

Batch processing rewards the cheapest reliable model. Quality matters less per doc since you can re-process failures. GPT-5 Mini and Claude Haiku 4.5 both offer 50% off via batch API. DeepSeek is the absolute cheapest but lacks a formal batch discount (their on-demand price is already lower than competitors batch prices). Gemini Flash with batch API is competitive too.

Key insights

1. Use batch APIs whenever you can — 50% discount is the easiest win in LLM cost optimization.
2. For 1M+ docs/month, even a 10% prompt template optimization saves hundreds of dollars. A/B test prompts on small batches first.
3. Failures matter: if 2% of docs fail and need retry on a more expensive model, the effective cost rises. Track failure rates per model.
4. Self-hosting an open model on GPU rentals (Modal, Runpod) can beat batch APIs at 10M+ docs/month — worth evaluating at high volume.

Cost at different scales

Scale	Deepseek Chat	Gpt 5 Mini	Gemini 2.5 Flash	Claude Haiku 4 5
100k docs (smoke test)	$61.88	$87.75	$107	$291
Baseline (1M docs)	$619	$878	$1073	$2910
10M docs (production)	$6188	$8775	$10.7k	$29.1k
100M docs (enterprise pipeline)	$61.9k	$87.8k	$107.3k	$291.0k

Try your own scenario

The numbers above use our best-guess assumptions. For your actual workflow, use the interactive calculator to plug in your real token volumes and quality requirements.

All cost figures are estimates based on publicly-listed pricing as of the data refresh date. Verify with the provider's official pricing page before making business decisions. Embedding costs, vector database costs, and infrastructure costs are not included unless explicitly noted.