Drop-In Class #14: Compute costs

no such thing as a free lunch -- especially for GenAI

Nov 04, 2024

Welcome to my newsletter, which I call Drop-In Class because each edition is a short, fun intro class for technology concepts. Except unlike many instructors, I'm not an expert yet: I'm learning everything at the same time you are. Thanks for following along with me as I "learn in public"!

Why compute costs matter right now

The first time I was really aware of compute costs was when I made a classic mistake and forgot to cancel a trial account. It was just a personal account for a certain cloud data warehouse, to mess around and do my own learning — so the bill was as tiny as my sample datasets. But still!

It’s easy for me to forget that every request you make to a computer is costing someone something. No such thing as a free lunch, as they say.

Who doesn’t forget? IT people. Compute costs have always been an IT concern. They’re the ones who know about what’s keeping the lights on.

But suddenly, compute costs aren’t just an IT concern — they’re a business concern. The compute bill is getting more eyeballs thanks to (everyone say it with me) generative AI. Those chatbot responses don’t grow on trees.

For instance, it costs about $0.18 to generate an image from a model like DALL-E 2. Imagine how much a service like Canva is paying for its 180 million users’ AI-generated images. No wonder Canva cited AI-powered features as the reason for increasing subscription prices by 300%.

So let’s talk about compute. What is it? How do we pay for it? And why is AI so expensive? All shall be revealed below.

What is compute, and what goes into the cost?

When we say “compute” we’re talking generally about the computational power you need to complete tasks like hosting a website or manipulating data. As in, “Holy moly! Alex’s AI model requires a ton of compute!”

So when we say “compute resources” we mean all of the stuff you need. There’s a lot of physical work that goes into computing: the hardware (CPUs and GPUs, the chips that process everything), the electricity needed to power the hardware, and cooling down the hardware to keep everything from burning down. Plus networking, which continues to be a mystery to me.

So you’ve got to pay either:

the cost of building and managing your own compute resources, i.e. servers in your own private data center. Or…
the cost of paying for someone else to build and manage compute resources (public cloud — AWS, Azure, GCP).

More folks are shifting to the latter and adopting public cloud, because it sucks to maintain your own hardware.

Public cloud pricing is pay-as-you-go, just like electricity or water. That’s one of the benefits. And depending on what you’re doing, cloud compute can actually be quite cost-effective.

But the more complicated the task and the faster you need it done, the more compute power is needed. Which brings us to the costs of AI.

GenAI takes a heck of a lot of compute

An IBM report says the average cost of computing is expected to climb 89% between 2023 and 2025, and most of the surveyed executives cited GenAI as the driver. Why is that?

Compared to regular machine learning models, the large language models (LLMs) used for GenAI have crazy compute demands. Both for training the model, and for inference (when the model is actually generating the response).

Training an LLM is a gargantuan operation. Meta’s largest LLaMa model took more than $2.4 million to train. So most companies have no business doing this. Only the ones in the business of building foundation models, like Meta, OpenAI, Anthropic, Databricks and Google.

But if a company isn’t doing the training, they’re still spending on the inference. It costs money every time a model generates a text summary, image, or code snippet. Every response requires billions of calculations. And the response is calculated in real time, which means the processing happens lickety split. So it’s some pricey computing.

Many LLM vendors use a pay-per-token pricing model for inference — like cloud, it’s pay as you go, but it looks a little different. “Tokens” are the units of text that the LLM has to process every time it generates a response — both your input, and the output it creates. For example, Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens. Easiest way to think of it as you’re paying by the word. Or pieces of words, anyways.

So inference really racks up a bill for companies building AI into their products, where the model needs to generate responses millions of times a day. Canva is one example. LinkedIn is rolling out AI-powered job search and resume help. Another I just ran into personally was Strava — the fitness app pops out an AI-generated summary with analysis of every workout.

And then the idea is they can charge more for subscriptions. They’re paying for inference, which means you’re paying for inference. Everyone is ultimately paying for compute resources.

So what can be done about compute costs? Either use less or make more money

There are entire consultancies dedicated to helping with cloud billing, and tools dedicated to cost optimization. It’s a really tough problem to solve. But there are general best practices — like resource right-sizing. Don’t buy more cloud than you need, and turn it off when you’re not using it.

Cost optimization and maximizing compute efficiency is what’s driving GenAI development strategies. There’s a reason there’s all this talk of RAG instead of fine-tuning an AI model: Fine-tuning is more computationally expensive, because you’re further training the model.

At the end of the day, though, there’s only so much cost reduction you can do. The point of technologies like cloud and AI are not to save money. They can help you save money, like avoiding hardware maintenance, but the goal is really to deliver better products. Companies like Canva and LinkedIn are betting that the costs of implementing AI will pay off, as long as users find value in AI-powered features. (So far they haven’t quite found it in copilots).

So if you’re spending more money on compute costs to run your model, but you’re making more money as a result of that model, who really cares? There’s no such thing as a free lunch, but you also have to spend money to make money. Just don’t forget to turn off the subscriptions you’re not using.

The cooldown: Extra reading

This is a 101-level drop-in, so I keep things at a high level! For deeper reading, check out these articles.

Navigating the High Cost of AI Compute (Andreessen Horowitz)
The hidden costs of AI: How generative models are reshaping corporate budgets (IBM)
GenAI strategy dictates ROI challenges for IT leaders (TechTarget)
Understanding the cost of Large Language Models (TensorOps)

See you in the next drop-in!

Cheers,

Alex

Drop-In Class