How Much Does an AI Customer Support Chatbot Cost? (2026 Breakdown)
A typical SaaS chatbot serving 30k conversations per month costs $45โ$340 depending on the model.
Scenario
A B2B SaaS company runs a customer-facing chatbot on their support page. The bot answers product questions, walks users through workflows, and escalates to a human when stuck. Traffic averages 1,000 conversations per day with about 8 turns each. The system prompt is large (~500 tokens) because it includes product context, tone instructions, and escalation rules โ but it repeats every turn, so prompt caching saves real money.
| Assumption | Value |
|---|---|
| Conversations / day | 1,000 |
| Turns / conversation | 8 (average) |
| System prompt | 500 tokens (cached) |
| User input | ~200 tokens / turn |
| Model output | ~150 tokens / turn |
| Days / month | 30 |
The system prompt fires on every turn โ ~50% of input tokens hit the prompt cache on providers that support it (Claude, OpenAI, Gemini, DeepSeek).
Monthly cost across recommended models
Calculated at 168M input tokens + 36.0M output tokens, with 50% prompt cache hit rate.
| Model | Input cost | Output cost | Cache savings | Total / mo |
|---|---|---|---|---|
| Deepseek Chat Cheapest | $47.04 | $15.12 | โ$21.17 | $40.99 |
| Gpt 5 Mini | $42.00 | $72.00 | โ$18.90 | $95.10 |
| Gemini 2.5 Flash | $50.40 | $90.00 | โ$22.68 | $118 |
| Claude Haiku 4 5 | $168 | $180 | โ$75.60 | $272 |
๐ก Switching from Claude Haiku 4 5 to Deepseek Chat saves $231/month (85% reduction).
Why these models
Chatbots need low latency, decent reasoning, and high cache utilization. Mid-tier models hit the sweet spot. Avoid frontier models (GPT-5, Claude Opus) โ 6-20ร more expensive with marginal UX gain for casual support questions. Sample 2-5% of conversations through a frontier model for quality monitoring instead.
Key insights
- 1. Prompt caching cuts the bill by 30-50% on every provider that supports it. If your model choice does not support caching, you are paying 2ร what you should.
- 2. Claude Haiku 4.5 and GPT-5 Mini deliver comparable chatbot quality at similar price points โ pick whichever has lower latency in your customer region.
- 3. Gemini 2.5 Flash is the longest-context option here (1M tokens), useful if your system prompt grows beyond 50k tokens.
- 4. DeepSeek is the cheapest by far but routing to Chinese servers may be a compliance / latency concern for enterprise customers.
Cost at different scales
| Scale | Deepseek Chat | Gpt 5 Mini | Gemini 2.5 Flash | Claude Haiku 4 5 |
|---|---|---|---|---|
| 100 users (10ร smaller) | $4.10 | $9.51 | $11.77 | $27.24 |
| Baseline (1,000 conv/day) | $40.99 | $95.10 | $118 | $272 |
| 10,000 conv/day (10ร growth) | $410 | $951 | $1177 | $2724 |
| 100,000 conv/day (enterprise) | $4099 | $9510 | $11.8k | $27.2k |
Try your own scenario
The numbers above use our best-guess assumptions. For your actual workflow, use the interactive calculator to plug in your real token volumes and quality requirements.
All cost figures are estimates based on publicly-listed pricing as of the data refresh date. Verify with the provider's official pricing page before making business decisions. Embedding costs, vector database costs, and infrastructure costs are not included unless explicitly noted.