Intelligence Per Joule

Margins, models, and melting machines.

Mar 04, 2026

AI has dragged software back into the physical world. We are hit with headlines, almost daily, on CapEx build-outs, where companies seem to one-up each other with extreme fervour and enthusiasm. The game on the field is one where the fear of being left behind is greater than the fear of being wrong. Spectators sit on each side of the spectrum; some – like me – would lean into the generational opportunity in front of us, while others, perplexed by the dollars being spent, state we reside in times of “irrational exuberance”.

Whichever side you are on, it is difficult to ignore the reality we are living in. A platform shift like AI rarely comes our way; the advent of the internet, the proliferation of mobile, and the move to the cloud are the only close contenders.

But innovation and progress do not arrive at our doorstep without making big asks of us. With the passage of time, it has become clear that one of the major bottlenecks to seeing things to their highest potential is access to sufficient power and electrification. As AI-led companies lean into energy literacy, physics will play a critical role in the margin equation. This will lead us into a world where access to power is weaponized, and cheap intelligence has the potential to reshape geopolitics.

The Stack Beneath the Stack

Before exploring the question of why solving for energy efficiency is critical, it’s helpful to peel the onion back and dig into the various components that constitute the AI build-out alone:

1) At the foundation sit the processors that AI models run on. GPUs and accelerators define the raw computational throughput of the system, and this layer sets the baseline for everything that follows. The usual suspects like Nvidia and AMD would fall into this bucket.

2) Across racks in a data centre, it’s imperative to have a tight mechanism that centralizes the flow for chips to connect with each other. High-speed networking allows thousands of chips to function as a single logical machine. This layer becomes critical as scale increases because inefficient communication turns compute into idle silicon. Arista Networks is the gold standard in this group.

3) We further move into cooling and power management. As compute density increases, thermal management becomes unavoidable. Power drawn by chips converts directly into heat, and cooling systems remove that heat fast enough to keep hardware operating within safe limits. The cooling throughput increasingly determines how many GPUs can be deployed in each facility. We’re essentially deploying billions to prevent machines from melting. Vertiv Holdings is an example of a company focused on this segment.

4) High bandwidth memory and storage sit alongside compute as another binding constraint. This part of the chain has become a core bottleneck and has captured headlines (and stock prices) recently. SK Hynix, Samsung, Micron and Sandisk are the largest companies in this segment. AI systems increasingly rely on high-bandwidth memory to preserve context and feed accelerators at the speed they require. When memory bandwidth lags compute capability, chips stall in reaching their potential.

5) Electrification forms the outer boundary of the system. No amount of capital or silicon matters if sufficient power cannot be delivered to the site. Grid access, transmission capacity, and generation availability increasingly determine where data centres can be built and how quickly they can expand. In many regions, power availability has become the gating factor rather than land, capital, or hardware supply. Oklo is an interesting company in this segment, using modular fission reactors to generate clean power.

6) Cloud platforms turn volatile, physical infrastructure into something that looks stable and elastic to the user. They hide complexity, price compute in simple units, and take responsibility for utilization, power variability, and reliability beneath the surface. Household names like Google and Amazon’s AWS would fall into this bucket.

Although value accrues across the entire stack, the most critical interactions occur around cooling, power management, electrification, and the abstraction layer that sits above them. Demand for compute continues its upward trajectory, but the ability to serve that demand will increasingly be determined by how efficiently energy can be converted into usable intelligence.

The Inference Game

The rationale for why electrification remains critical is better explained through the lens of inference economics. When we roll back time, the era of the CPU had unique characteristics that, for better or worse, are forgone today. As we think about our GPU-led future, the margin structure and scaling bottlenecks relative to the past offer a stark contrast. CPUs scaled with Moore’s Law, and energy costs were tiny relative to the marginal value created. Compare that with today’s GPUs, where instead, power availability, grid access, and load balancing define margins and economic viability. One’s edge will begin to be less determined by model quality, but rather by intelligence delivered per unit of energy, or as the essay suggests, by ‘intelligence per joule’.

The other variable here is that the economics of training data are different from inference. Training is predictable, while inference is spiky and occurs in spurts (for the musicians in the room, the difference between a sustained note and a staccato - same instrument, different energy profile). With this volatility, physics begin to limit scale, as GPUs are a fixed cost, and inference economics do not align well. More so, over-provisioning for GPUs is a smart strategy when training models, but can destroy margins when considering inference. This points to a future where electricity for compute, cooling, power conversion losses, grid demand charges all begin to play a larger role in eventual outcomes. Even a 10–20% energy efficiency gain can matter more than a 10–20% model improvement.

This is why energy efficiency compute will drive our future, and to be part of the cohort of the most impactful companies in the world, tech businesses will have to review their playbook and lean in further into owning and operating more in the physical world.

*Tightrope, reclaimed electrical wires and components, Elias Sime (2019).*

Amazon and Exxon Have a Baby

The header of this section might leave you confused, but I believe the most value will accrue to companies that control energy, compute, and abstraction. Almost like a combination of Amazon and Exxon.

As discussed, AI margins will begin to force compute providers to internalize cooling, energy, and grid constraints. We have seen this story play out before. In the cloud era, hyperscalers like Amazon launched web services, where the premise was to abstract away complexity for customers and never have them think about data centre design, custom networking, or access to storage. ‘Cloud as a service’ became the norm, and in a similar vein, future winners will offer ‘inference as a service’ (Amazon is already ahead of the curve here - Bedrock, their managed inference platform, is now a multibillion-dollar business).

If we peer into the future, leading compute companies like Nvidia, AMD, CoreWeave will move downstream and prioritize access to power. A few actions they will - and perhaps have already - undertake:

- Long-dated power purchase agreements (PPAs) will become the norm, which will enable compute providers to have access to renewable power at fixed prices for periods lasting 15-20 years.
- Acquiring or investing directly in power generation assets. This goes beyond PPAs, as seen with Microsoft’s deal to restart Three Mile Island, Amazon’s nuclear investments, and Google’s agreement with Kairos Power.
- Beyond thermal and cooling innovation, load shaping and routing inference to nodes of the ‘highest’ power will be paramount. It’s like Google Maps for energy, routing workloads to where there’s the least congestion.
- In some cases, rather than drawing from the grid, building facilities adjacent to power plants, hydroelectric dams, or solar farms will eliminate transmission losses and secure dedicated capacity.

A more radical and recently vocalized approach is the notion of building data centres in space. I was perplexed initially (envisioning server racks flying in space isn’t easy to digest), and I am certain execution risk is large, but conceptually the idea does hold value: space’s near-constant solar access and radiative cooling, with no need for water or mechanical systems, solves the power and thermal management challenge. Of course, this vision comes with its own new constraints: launch capacity, orbital slots, space debris (Kessler syndrome), and inter-satellite bandwidth, to name a few.

*Funny side-note from Matt Levine: Elon is indeed using his rockets to fund his “cash-burning website”; the opposite would occur in a past life, when Elon funded Tesla from his PayPal proceeds.*

Not All Belongs to Goliath

The chatter of high capital expenditures, moats defined by the physical world, and comparisons to trillion-dollar henchmen like Nvidia might provide the impression that the winners of the future are already too far along. I don’t agree. We are early, and as the innovator’s dilemma plays out - where incumbents are too consumed by their existing business - early-stage companies retain room to build something lasting. It’s a story we’ve seen with every cycle in the past, and I see no reason why this time is different.

We can start with inference orchestration; middleware companies that optimize for economics and delivery at scale. The big advantage for middleware is the durability to operate across models, along with their low capital intensity relative to the foundational model layer; we’ve seen this pattern work before with Stripe in payments, Twilio in communications, and more recently, Baseten in inference. Critically, each of these middleware functions directly reduces the energy required to deliver a unit of useful intelligence:

- Model routing: Routing that directs prompts based on complexity, cost, and latency can reduce the total compute expended across the system. It’s one of the most direct levers that can impact intelligence per joule.
- Prompt optimization: Tools built for prompt compression, testing, and iteration reduce unnecessary token consumption without degrading output quality.
- Retrieval and vector infrastructure: Retrieval-augmented generation (RAG) and vector databases allow for models to access relevant information, as opposed to processing the whole at once. This is sure-fire way to expand context windows and reduce compute.
- Evaluations, Guardrails: Every hallucination represents wasted energy. Tools that measure model outputs for accuracy, bias, and reliability reduce the number of cycles spent producing and correcting low-quality results.
- Caching: Inference workloads are repetitive. Caching layers that store and reuse prior outputs, compress inputs, and batch similar requests eliminate redundant computation entirely.

Outside of middleware, companies like Etched, Cerebras, and Groq have developed inference-specific silicon that is purpose-built, as opposed to using GPUs originally built for training. The thesis here is that GPUs are built for the hardest workloads, not the most common ones, which leads to wasted compute.

There is also the path of working with smaller models that retain most of the capability of their larger parents at dramatically lower compute cost. Techniques like quantization (reducing numerical precision), pruning (removing redundant parameters), and knowledge distillation (training a smaller model to mimic a larger one) all trade marginal accuracy for energy savings.

Application layer companies, too, should ask themselves tough questions around ways in which they can reduce inference costs. Leaning into brand and distribution, while being at the behest of model companies, is not the best long-term strategy. For example, Lovable, although reliant on language models, is constantly working on prompt quality to manage inference costs. As the AI build-out matures, the models will become a commodity, where processing speed, low latency, and energy optimization become the moat to product quality margins at scale.

AI dragged software back into the physical world, and the physical world has its own rules. Intelligence per joule is not a technical metric. Rather, it is an economic one, a strategic one, and increasingly, a geopolitical one. For founders, this means building with energy literacy from the get-go. For investors, it means looking beyond software elegance and asking a more fundamental question: how much useful intelligence does this system produce per unit of energy consumed? The answer will separate the enduring companies from the forgettable ones.

Thank you to Rahul Sanghi from Tigerfeathers for reading and editing my drafts. And to Nynika Jhaveri for the art inspiration.

Discussion about this post

Ready for more?