16 Apr AI TCO: The Math of GenAI
Most organizations are still trying to make sense of AI costs using familiar terms like licenses, VM-hours, and static TCO frameworks. However GenAI operates on a different model, its costs are variable and continuously evolving.
Every interaction consumes tokens, and that demand is nonlinear. A single feature can produce wildly different bills depending on reasoning depth, prompt length, conversation history, and your RAG (Retrieval-Augmented Generation) strategy. This unpredictability is exactly why companies are getting “sticker shock” surprises.
The answer lies in precision economics, as Deloitte frames it:shifting from rough estimates to tracking costs at the token level.
What is an AI Token?
Before we talk about costs, we have to understand the currency. In the world of Large Language Models (LLMs), it is not billed by the word; it is billed by the token.
Think of a token as a “chunk” of text. While a short word might be one token, longer words are broken down into multiple pieces.
- 1 token ≈ ¾ of a word (English)
So on average, 1,000 tokens is roughly 750 words, about the length of a two-page document. Every piece of data you send, the input, and every word the AI generates, the output, is counted, processed, and billed.
A Token IS THE New Unit of Accountability
In this AI world, tokens are the new unit of accountability and the fundamental question for any AI project is: Are we converting tokens into outcomes, or just burning them?
To answer that, you have to make the costs legible. Luckily LLM pricing is just math. The most effective metric is the Cost per Completion.

Where:
- : Every token sent (system prompts, chat history, RAG context, JSON document).
- : Every token the model returns.
- : Price per million tokens.
A Practical Example
Let’s look at a use case: using an LLM to analyze a 50-page Request for Proposal (RFP) to extract key requirements.
- (Input): 40,000 tokens (The full RFP text + detailed extraction instructions).
- (Output): 2,000 tokens (A structured list of requirements and a risk assessment).
- Rates (): per million in / per million out (High-end frontier model).
The Calculation:
How the Math Breaks Down:
- Input Cost (): You are sending 40,000 tokens. Since the model is priced in units of 1,000,000, you are essentially buying 4% of a unit (). Multiplying that 4% by the rate gives you .
- Output Cost (): You are receiving 2,000 tokens. This is 0.2% of a unit (). Multiplying that tiny slice by the premium rate gives you .
- Total: per report.
Why is the output 5x More Expensive?
You’ll notice that while the input was 20 times larger than the output, the output still contributed 20% of the total cost. This “pricing asymmetry” exists because of how LLMs physically process data:
- Parallel Processing (Input): The model can “read” all 40,000 tokens of your RFP simultaneously. This is compute-efficient and fast.
- Sequential Decoding (Output): The model generates text one token at a time. To write “The,” it has to calculate the probability of that word based on the entire prompt; then it repeats that entire heavy calculation to write the next word. It is a slow, sequential loop that keeps the GPU “locked” for longer.
Once you define this unit cost, budgeting starts being a multiplication problem.
Building the Gen AI TCO
A realistic Total Cost of Ownership framework for GenAI cannot be built from a fixed annual budget or estimated by equating a model call to a VM-hour. It must be built from the token itself.
Use the Cost per Completion calculation as your baseline unit cost. To construct a complete TCO, start multiplying this unit by forecasting operational volume: [Completions per day] x [Number of users].
Only by forecasting your token requirements can you accurately stack secondary overhead costs, such as developer time, vector database storage, and DevOps. This approach moves you away from high-level estimates and toward building a more precise TCO model that finally tells you the truth about your AI investments.
No Comments