📚

RAG App Cost Breakdown — Document Q&A at 5k Queries/Month

A RAG app serving 5k queries per month over a moderate document corpus runs $35–$240/month.

Cost range: $35–$240/mo

Scenario

A knowledge-base Q&A application. Users ask questions over a company document corpus (handbooks, product docs, past tickets). Each query retrieves the top 5-10 chunks (~10k tokens) and passes them with the user question to an LLM. The retrieval system prompt is 500 tokens. The same documents get retrieved repeatedly for similar queries, so 70% cache hit on the retrieval system + frequently-accessed chunks is realistic.

Assumption Value
Queries / month 5,000
Retrieved context ~10,000 tokens / query
User question ~500 tokens
Response length ~800 tokens
Cache hit rate 70% (retrieval system + hot chunks)

Embedding costs (for indexing the corpus once) are a one-time $5-20 and not included here. Vector database costs (Pinecone, Qdrant) are also separate — budget $0-25/month for self-hosted, $70+ for managed.

Monthly cost across recommended models

Calculated at 53M input tokens + 4.0M output tokens, with 70% prompt cache hit rate.

Model Input cost Output cost Cache savings Total / mo
Deepseek Chat Cheapest $14.70 $1.68 −$9.26 $7.12
Gpt 5 Mini $13.13 $8.00 −$8.27 $12.86
Gemini 2.5 Flash $15.75 $10.00 −$9.92 $15.83
Claude Haiku 4 5 $52.50 $20.00 −$33.08 $39.42

💡 Switching from Claude Haiku 4 5 to Deepseek Chat saves $32.31/month (82% reduction).

Why these models

RAG needs long input context and benefits massively from caching. Gemini 2.5 Flash wins on context length (1M tokens) — useful when retrieved chunks are large. Claude Haiku 4.5 wins on caching efficiency (90% off cached input). GPT-5 Mini balances quality and price. DeepSeek wins on raw cost but has shorter context (~64k typical).

Key insights

Cost at different scales

Scale Deepseek ChatGpt 5 MiniGemini 2.5 FlashClaude Haiku 4 5
Small team (500 queries) $0.71$1.29$1.58$3.94
Baseline (5k queries) $7.12$12.86$15.83$39.42
Growing product (50k queries) $71.19$129$158$394
Enterprise (500k queries) $712$1286$1583$3943

Try your own scenario

The numbers above use our best-guess assumptions. For your actual workflow, use the interactive calculator to plug in your real token volumes and quality requirements.

All cost figures are estimates based on publicly-listed pricing as of the data refresh date. Verify with the provider's official pricing page before making business decisions. Embedding costs, vector database costs, and infrastructure costs are not included unless explicitly noted.