What an AI Agent Actually Costs to Run (With Real Numbers)

Most pricing conversations about AI agents focus on the build cost — how much to design and deploy the thing. The ongoing run cost is usually a footnote. But for any business evaluating whether an agent project makes economic sense, the run cost is the number that matters month over month.

In our earlier post on what AI agents actually do, we cited $20–$200/month as the typical API cost range once a custom agent is live. What follows is the math behind that range.

How token billing works

Every Claude API call is billed on two dimensions: input tokens and output tokens.

Input tokens are everything the model reads — your system prompt, the conversation history, any data you pass in (a document, a form submission, a CRM record). Output tokens are everything it writes back in response. Prices are quoted per million tokens (MTok).

One token is roughly three-quarters of a word. One thousand tokens is approximately 750 words — about three pages of normal text. Run the mental model once on your specific workflow, and the numbers stop feeling abstract.

The three models and their current prices

As of June 2026, Anthropic offers three current Claude tiers:

Claude Haiku 4.5 — $1 input / $5 output per MTok. Fastest, designed for high-volume tasks that don't require deep reasoning.
Claude Sonnet 4.6 — $3 input / $15 output per MTok. Balanced speed and capability; handles most business reasoning and writing tasks well.
Claude Opus 4.8 — $5 input / $25 output per MTok. Maximum capability, 1M token context window, best for complex document analysis and nuanced judgment calls.

One number worth keeping in mind: Claude Opus 4.1 — the previous Opus generation, now deprecated — was priced at $15 input / $75 output per MTok. Opus 4.8 is three times cheaper per token and outperforms it on most benchmarks. If you've seen "Claude is expensive" anywhere recently, that claim is probably anchored to older pricing.

Three times cheaper and more capable. The "AI is expensive" concern is worth revisiting if your last estimate is even a few months old.

Three real workflows, with the actual math

1. Customer-facing FAQ agent (high volume, simple tasks)

A bot answering product and policy questions for a service business. 400 interactions per day. Each interaction: roughly 1,000 tokens in (the user's question plus a moderately detailed system prompt), and 300 tokens out (a direct answer).

On Claude Haiku 4.5:
(400 × 1,000 / 1,000,000 × $1) + (400 × 300 / 1,000,000 × $5) = $0.40 + $0.60 = $1.00/day → ~$30/month

2. Contract review agent (low volume, high complexity)

An agent that reads vendor agreements, flags non-standard clauses, and produces a risk summary for review. Ten contracts per day, averaging 30,000 tokens in per document (a 15-page contract), and 1,500 tokens out per summary.

On Claude Sonnet 4.6:
(10 × 30,000 / 1,000,000 × $3) + (10 × 1,500 / 1,000,000 × $15) = $0.90 + $0.23 = $1.13/day → ~$34/month

On Claude Opus 4.8 (for contracts where you need the best clause analysis):
(10 × 30,000 / 1,000,000 × $5) + (10 × 1,500 / 1,000,000 × $25) = $1.50 + $0.38 = $1.88/day → ~$56/month

3. Lead qualification agent (moderate volume, moderate complexity)

An agent that reads incoming form submissions, scores leads against your ideal customer profile, and routes qualified ones to your CRM with a brief summary. 80 leads per day. 2,000 tokens in (form data, ICP criteria, context), 400 tokens out (score plus routing decision).

On Claude Sonnet 4.6:
(80 × 2,000 / 1,000,000 × $3) + (80 × 400 / 1,000,000 × $15) = $0.48 + $0.48 = $0.96/day → ~$29/month

These three workflows land between $29 and $56 per month. That's the middle of the $20–$200 range — and each of these represents a workflow that saves several hours of human time per week. The math usually works in favor of building.

Three ways to lower the number without downgrading the model

Prompt caching. If your system prompt is a few hundred tokens long and repeats on every API call, caching it reduces the cost of those repeated tokens by up to 90% on subsequent requests. A 700-token system prompt running 400 times per day means you're paying for 280,000 tokens in system prompt overhead. With caching, the first call pays full price and every subsequent call reads the cached version — cutting that overhead to roughly 28,000 tokens billed at full price, the rest at 10%.

Message Batches API. For workflows that don't require real-time responses — nightly lead scoring, weekly document processing runs, batch data extraction — Anthropic's Message Batches API charges 50% of the standard rate. The trade-off is asynchronous delivery: results come back when complete rather than immediately. Most scheduled automation tasks don't need synchronous responses anyway.

Test Haiku first. Take 50 real inputs from your workflow, run them through Claude Haiku 4.5, and evaluate the outputs against your actual quality bar. Most developers default to Sonnet or Opus without testing whether Haiku is good enough for their specific use case. Haiku handles a wider range of business tasks than its price suggests — often everything from classification and extraction to first-draft generation. Upgrade only for the portion of tasks where Haiku demonstrably falls short.

Most agents I've seen in production are running at 20–40% of what they could cost, because the builder picked the right model for each task instead of defaulting to the most capable one.

Where the floor and ceiling actually come from

The $20/month lower bound is roughly one lightweight agent — a simple classifier or triage bot, a few hundred calls per day on Haiku, with a cached system prompt. The $200/month upper bound is a complex multi-step agent doing significant document processing at volume, or running Opus-tier reasoning at real scale.

Most business automations — lead qualification, support triage, proposal drafting, structured data extraction — fall between $30 and $80/month in API costs once they're tuned. That's a range where most agents pay back their monthly cost within the first few days of saved time.

If you want to scope the API cost for a specific workflow before committing to build it, that's exactly the kind of question we answer in a discovery call. Drop us a line and we'll run the numbers on your volume and quality requirements.

— Cole

Sources

Anthropic Claude Models & Pricing — all model prices, context windows, and capabilities (fetched June 8, 2026)
Anthropic Platform Release Notes — prompt caching discounts and Message Batches API pricing (fetched June 8, 2026)