Tokenmaxxing isn't an AI strategy

What does AI cost? It's a simple question and an important one – the answer will determine the fate of companies and shape society. But it's also a question that can't be answered in a meaningful way without additional context.

One possible response is "too much." US private AI investment reached $285.9 billion in 2025, according to Stanford HAI's 2026 Artificial Intelligence Index Report. That money has economic benefits but also adds stress to environmental resources, utilities, and communities.

As the report states, "AI data center power capacity rose to 29.6 GW, comparable to New York state at peak demand, and annual GPT-4o inference water use alone may exceed the drinking water needs of 12 million people."

Then there's the cost to human competency, when skills atrophy or never develop due to overreliance on prompt slot machines.

But that's difficult to measure over a short period of time. And given the current US administration's disinterest in regulatory restraint or public concern, it's perhaps easier to focus on the financial minutiae until government and industry can be forced to reckon with civic unease.

You could start with the token, the basic unit for selling the input and output of AI models at the moment. The price of tokens has been much on the mind of developers using AI subscription plans because plan providers like Anthropic and GitHub have been pushing customers away from token-subsidized subscriptions toward pay-as-you-go consumption.

Devansh, a machine learning researcher, head of AI at legal startup Iqidis, and founder of an AI community group called the Chocolate Milk Cult, did the math in a post published earlier this year. The answer is about $0.0038 per million tokens – in a very specific context.

That's the base cost for inference on an Nvidia H100 GPU, rented at a cost of $2.50/hour and generating 185 tokens/second at 100 percent utilization.

But as Devansh observes, no one runs at 100 percent utilization. At 30 percent utilization, the price would be \~$0.013/M tokens; at 10 percent, it would be \~$0.038/M tokens.

Anthropic currently charges $5/M tokens (input) and $25/M tokens (output) for its latest model, Opus 4.7. For Google's Gemma 4 26B A4B, the weighted average input price at the time of writing is $0.096/M tokens, per OpenRouter.

If you run the numbers on different hardware, priced at a different time, with different energy costs, on different models, with different utilization, you'll get different results.

"If you were to just look at what the labs provide as the cost per API, it's a very good signal for what the token costs them, for the Western labs," Devansh told The Register in a phone interview.

"Some people say that Anthropic's trying to get about a fifty percent gross margin. But in reality what a token cost is actually many variables rolled into one. You have the model, you have the research behind the model, constant updates in the models that people don't see. So you have to factor all of those in. It's not just the cost of inference at one call, which is actually not a very good way to look at the system."

Devansh said organizations tend not to focus on the specific cost of tokens because they're focused on delivering a service that customers value.

"In a lot of legal work, you can actually pass costs along to your customer and the customers will not complain because they want to see transparency into what was done and how it was done," he said. "So from that perspective, there's less of a worry about how much this will cost as long as you can justify your costs. … As long as you're consistently able to deliver the value, I think forecasting costs are a little bit less worrisome."

Companies like Meta and Shopify have made headlines by treating token usage as a key performance indicator, and employees have answered the call by trying to signal their value through heavy use of AI tools. That can get expensive quickly and may not do much for more meaningful business metrics.

"Is token spend directly correlated with productivity?" said Devansh. "Absolutely not. I've done this research very extensively. … Before you used to have lines of code and other kinds of stupid productivity metrics, like how many words you typed. So this is just the latest in that era of stupidity. I think middle managers will always try to justify themselves and find a way they can rank people without having to apply their brains."

But one of the issues with LLMs, said Devansh, is we don't know how best to apply them. So there's potential value in just encouraging people to spend tokens in case they come up with new kinds of workflows that provide signals about what works and what doesn't.

Bob Venero, CEO of IT consultancy Future Tech Enterprise, told The Register that his company tends to work with Fortune 100 clients, and that many of them have stood up AI projects that involve throwing around a lot of money without thinking through what they want to accomplish.

Venero said when his company engages with clients, the goal is to figure out the desired business outcome, which may or may not involve AI.

Future Tech's recent work with Northrop Grumman did involve AI – the IT biz helped implement an Nvidia Enterprise AI Factory to help the defense firm run AI workloads relevant to its projects.

Venero said that companies are struggling to assess the impact of AI in their environment, to measure ROI, and to discover how the technology may be useful.

"So there's a lot of pre-work that needs to be done to identify where they want to spend their money and what the outcome is going to be, especially when costs are 3x of what they were six months ago," he said, citing "Ramageddon" – the shortage of RAM due to the AI compute boom.

Venero points to OpenAI's commitment to purchase memory chips from Samsung and SK Hynix, and the shift of OEMs like Micron toward high-bandwidth memory, as catalysts for the current RAM crisis. That complicates ROI calculations for AI deployments, he said, because everything has become more expensive.

Cloud providers can help by offering consumption-based pricing, he said, but he has some reservations about that.

"I'm not a huge fan of off-prem AI," he said. "It's a little bit scary from our perspective."

Setting aside the security concerns, Venero said the productivity risk of cloud dependency is substantial for large organizations. He pointed to Microsoft Office 365. "Has Office 365 ever gone down?" he said. "Multiple times. And there are so many of those outages that happen."

If a cloud outage costs a company a thousand dollars per minute of downtime, he said, maybe that's acceptable. "If it's a million dollars a minute, you probably want to think about the controls that need to be in place, and that's probably an on-prem solution," he said.

AI may be making cloud stability worse, through the deployment of under-reviewed code and the infrastructure stress that has followed from heavy AI use. Customers, Venero said, "are absolutely seeing that. And when they're not, we're educating them."

In light of the capacity challenges created by the sudden popularity of OpenClaw, Venero said, "People threw this into their environment and it did crazy stuff. So there's definitely an ecosystem conversation that needs to happen about risk and the three different pillars of risk that are tied to it."

And, he said, the hyperscalers have contributed to the problem by focusing on speed at the expense of quality. "Right now it's a race. Who's going to win? Who's going to take the most? And everybody's throwing everything at it. And it's just causing this incredible turmoil."

"What we want our customers to do is step back," he said. "Take a look at what you want to accomplish and why. Look at the associated investments and the right timeline to do it and then measure those outcomes."

Approaching AI thoughtfully and deliberately makes it more likely for AI projects to make it into production.

Venero said that among the companies he's seen, prior to being educated about AI, maybe 15 percent of their prototypes would actually be deployed. With guidance, he said, that figure is more like 45 or 50 percent.

"It's very use-case-specific," he said. "And when you have the outcomes that you're trying to drive to and then you measure those outcomes, you will be successful. If you're not, if you're doing AI for the sake of AI, it's gonna be five percent."

Maybe asking what AI costs should not be the first question. Citing the pressure some employees feel to show their value by expending tokens, Venero said, the question should be, "Why? And what are you using them for?" ®