AI Agent Cost Planner

Model cost comparison tailored to your workload

Configure request volume, token usage, and success targets to understand the total cost of ownership for leading proprietary and open-source LLMs. Benchmark cost per successful task before you commit to scale.

Usage Assumptions

Enter the expected request volume and token usage. Adjust complexity to reflect prompt chaining or tool augmentation, and include a redundancy buffer for retries or fallbacks.

Monthly requestsAll attempts including retries and fallbacks.Avg input tokens / requestInclude system prompts, knowledge base snippets, tool outputs.Avg output tokens / requestEstimate final responses or tool instructions generated by the agent.Task complexity multiplier Accounts for prompt chaining, external tool calls, or multi-step reasoning bursts. Redundancy buffer (%)Extra requests reserved for retries, A/B testing, or fallbacks.Expected success rate (%) Adjust by monitoring production success metrics (non-null, correct output).

Key takeaways

Compare headline metrics to understand where each model shines. Notes include utilization assumptions for self-hosted options.

Llama 3.2 1B Instruct offers the lowest cost per successful task at $0.0000.
Llama 3.2 3B Instruct trails by 201.1% versus the leader, while offering low-latency chat agents.
Self-hosted models assume GPU utilization that matches throughput. Idle time inflates cost—verify notes against your deployment plan.

Model	Monthly cost	Cost / request	Cost / successful	Est. successes	Notes
Llama 3.2 1B Instruct Meta	$0.09	$0.0000	$0.0000	18,216	Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens. Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 3B Instruct Meta	$0.27	$0.0000	$0.0000	18,216	OpenRouter pricing converted to USD per 1K tokens (Oct 2025). Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Hunyuan A13B Instruct Tencent	$0.40	$0.0000	$0.0000	18,216	Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB. Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 11B Vision Instruct Meta	$0.65	$0.0000	$0.0000	18,216	Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025). Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.0 Flash-Lite Google	$1.32	$0.0001	$0.0001	18,216	Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens. Token pricing: $0.0001 (input) & $0.0002 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
ERNIE 4.5 21B A3B Baidu	$1.84	$0.0001	$0.0001	18,216	Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens). Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
ERNIE 4.5 21B A3B Thinking Baidu	$1.84	$0.0001	$0.0001	18,216	Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion. Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 4 Scout Meta	$2.02	$0.0001	$0.0001	18,216	Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025). Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Nano OpenAI	$2.19	$0.0001	$0.0001	18,216	Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M. Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1 nano OpenAI	$2.63	$0.0001	$0.0001	18,216	Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens. Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.0 Flash Google	$2.63	$0.0001	$0.0001	18,216	Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens. Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.3 70B Instruct Meta	$2.81	$0.0001	$0.0002	18,216	Pricing converted from OpenRouter USD/token rates (Oct 2025). Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Mixtral 8x7B Instruct Mistral AI	$3.03	$0.0002	$0.0002	18,216	Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling. Self-hosted: $2.50/hr on RunPod NVIDIA L4 (8x7B MoE), ~8,000 tokens/sec throughput. Assumes 0.5 compute hours + 0.8 warmup/idle hours per month.
ERNIE 4.5 VL 28B A3B Baidu	$3.69	$0.0002	$0.0002	18,216	OpenRouter pricing (Oct 2025) converted to USD per 1K tokens. Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o mini OpenAI	$3.95	$0.0002	$0.0002	18,216	Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens. Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 4 Maverick Meta	$3.95	$0.0002	$0.0002	18,216	Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025). Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok 4 Fast xAI	$3.96	$0.0002	$0.0002	18,216	Pricing sourced from OpenRouter USD/token rates (Oct 2025). Token pricing: $0.0002 (input) & $0.0005 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
DeepSeek V3.2 Exp DeepSeek	$4.32	$0.0002	$0.0002	18,216	Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens. Token pricing: $0.0003 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 90B Vision Instruct Meta	$4.86	$0.0002	$0.0003	18,216	Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025). Token pricing: $0.0003 (input) & $0.0004 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.1 70B Instruct Meta	$5.34	$0.0003	$0.0003	18,216	Cost assumes steady-state utilization with autoscaling to avoid idle GPU time. Self-hosted: $3.20/hr on AWS g6.8xlarge (1x L4 24GB), ~5,500 tokens/sec throughput. Assumes 0.7 compute hours + 1.0 warmup/idle hours per month.
Gemini 2.5 Flash-Lite Google	$6.78	$0.0003	$0.0004	18,216	Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens. Token pricing: $0.0001 (input) & $0.0013 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 1.5 Flash Google	$7.69	$0.0004	$0.0004	18,216	Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens. Token pricing: $0.0003 (input) & $0.0010 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok Code Fast xAI	$8.32	$0.0004	$0.0005	18,216	Pricing sourced from OpenRouter USD/token rates (Oct 2025). Token pricing: $0.0002 (input) & $0.0015 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Qwen-Plus Alibaba Cloud	$8.79	$0.0004	$0.0005	18,216	Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans. Token pricing: $0.0004 (input) & $0.0012 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1 mini OpenAI	$10.53	$0.0005	$0.0006	18,216	Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens. Token pricing: $0.0004 (input) & $0.0016 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Mini OpenAI	$10.94	$0.0006	$0.0006	18,216	Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost. Token pricing: $0.0003 (input) & $0.0020 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-3.5 Turbo (Jan 2025) OpenAI	$10.99	$0.0006	$0.0006	18,216	Standard tier $0.50 (input) / $1.50 (output) per 1M tokens. Token pricing: $0.0005 (input) & $0.0015 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Doubao Pro 32K ByteDance	$18.46	$0.0009	$0.0010	18,216	Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ. Token pricing: $0.0008 (input) & $0.0025 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3.5 Haiku Anthropic	$24.55	$0.0012	$0.0013	18,216	Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens. Token pricing: $0.0008 (input) & $0.0040 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.5 Flash Google	$27.35	$0.0014	$0.0015	18,216	Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate. Token pricing: $0.0006 (input) & $0.0050 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude Haiku 4.5 Anthropic	$30.69	$0.0016	$0.0017	18,216	Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan. Token pricing: $0.0010 (input) & $0.0050 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Qwen-Max Alibaba Cloud	$42.13	$0.0021	$0.0023	18,216	USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch. Token pricing: $0.0016 (input) & $0.0064 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GLM-4 Plus Zhipu AI	$46.15	$0.0023	$0.0025	18,216	USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers. Token pricing: $0.0021 (input) & $0.0063 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1 OpenAI	$52.67	$0.0027	$0.0029	18,216	Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens. Token pricing: $0.0020 (input) & $0.0080 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 OpenAI	$54.70	$0.0028	$0.0030	18,216	Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025). Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Codex OpenAI	$54.70	$0.0028	$0.0030	18,216	Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads. Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.5 Pro Google	$54.70	$0.0028	$0.0030	18,216	Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately. Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o OpenAI	$65.84	$0.0033	$0.0036	18,216	Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate. Token pricing: $0.0025 (input) & $0.0100 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o (Aug 2024) OpenAI	$65.84	$0.0033	$0.0036	18,216	Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier. Token pricing: $0.0025 (input) & $0.0100 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude Sonnet 4.5 Anthropic	$92.07	$0.0047	$0.0051	18,216	Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing. Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3.5 Sonnet Anthropic	$92.07	$0.0047	$0.0051	18,216	Maintained for backwards compatibility; matches current Sonnet 4.5 pricing. Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Cohere Command R+ Cohere	$92.07	$0.0047	$0.0051	18,216	Includes 1,000 free requests per month on the Cohere platform. Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok 4 xAI	$92.07	$0.0047	$0.0051	18,216	Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens. Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o (May 2024) OpenAI	$109.89	$0.0056	$0.0060	18,216	May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios. Token pricing: $0.0050 (input) & $0.0150 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 1.5 Pro Google	$153.85	$0.0078	$0.0084	18,216	Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series. Token pricing: $0.0070 (input) & $0.0210 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Mistral Large Mistral AI	$175.82	$0.0089	$0.0097	18,216	Latency-friendly alternative to GPT-4 tier. Token pricing: $0.0080 (input) & $0.0240 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4 Turbo (Apr 2024) OpenAI	$219.78	$0.0111	$0.0121	18,216	April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows. Token pricing: $0.0100 (input) & $0.0300 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3 Opus Anthropic	$460.35	$0.0232	$0.0253	18,216	Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x. Token pricing: $0.0150 (input) & $0.0750 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Pro OpenAI	$656.37	$0.0331	$0.0360	18,216	Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads. Token pricing: $0.0150 (input) & $0.1200 (output) per 1K tokens. Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.