Computational Abundance
The bubble builds the intelligence grid. The bust distributes it.
Across the AI supply chain, millions of accelerators are in motion: ordered, manufactured, staged, shipped, queued for facilities that do not yet have power. xAI’s Colossus programme targets a million accelerators on its own; Meta’s roadmap references multi-million Blackwell- and Rubin-class deployments through 2027; Oracle, OpenAI, the U.S. Department of Energy, and the rest of the hyperscaler cohort add many more.[1][2]
At any moment, many of them sit on pallets in warehouses around the world. Climate-controlled, security-gated, quiet. Waiting for a transformer. Waiting for a substation. Waiting for a gas turbine, a grid interconnection, or steel from another state.
Some may never be plugged into the slots they were ordered for. For others, one, two, even three generations of silicon will ship before they see a power connection.
The first AI capital cycle is overbuilding the machinery of intelligence before the economy knows how to use it. The chips, asleep on pallets, will go to work anyway.
I. The Plug
The limit on AI is electricity, permitting, and time.
The hyperscalers expect to spend over $700 billion on AI and cloud infrastructure in 2026 alone.[3] To fund it, the top five U.S. hyperscalers issued a record $121 billion in corporate bonds during 2025 — over four times their historical average — with Bank of America projecting $175 billion in 2026 issuance. Amazon’s $54 billion global bond sale in March 2026 was the largest debt transaction in its history.[4] AI-linked venture deployment reached roughly a quarter-trillion dollars in a single quarter, dominated by OpenAI’s $122 billion round, Anthropic’s $30 billion round, and xAI’s $20 billion round.[5] U.S. private equity poured $45.7 billion into data centres in 2025 — 72% of all sector investment.[6] Appendix A details the sources.
None of it ships intelligence to a customer.
It ships steel, copper, silicon, and concrete to a power easement that does not yet exist. The bottleneck has migrated: from the model to the chip to the substation, the transformer, the gas turbine, the air permit, the cooling tower, the interconnect queue.[7]
The market priced AI as if compute and intelligence were the same thing. Compute is silicon in racks connected to substations connected to fuel that has to be physically delivered. The chip is the easy part.
Geography entered the argument in 2026. Henry Hub gas sat near $2.83/MMBtu while TTF in Europe and JKM in Asia traded above $17 and $18/MMBtu.[8] Run that fuel through a combined-cycle plant at the EIA’s 2024 heat rate[9] and the fuel-only marginal cost of electricity in LNG-import regions runs six to seven times the U.S. baseline. For a continuously loaded 100 MW campus, the spread alone is roughly $100 million a year — before transmission, capacity charges, cooling, financing, or carbon. Compute is now a geography.
The deeper fact is temporal.
The limiting factor in AI is no longer whether chips can think. It is whether anyone can plug them in before they depreciate.
A bridge delayed by two years is still a bridge. A GPU delayed by two years has crossed an architectural boundary. Hopper to Blackwell. Blackwell to Rubin. Rubin to Feynman.
The scarce asset is not the GPU. It is the slot: permitted power, cooling, land, transformer capacity, networking, operators, and the legal right to turn electricity into computation for the next decade.
Every order book is a claim on a future slot. Every “X gigawatts” announcement is a claim on a slot. The order book equals demand only if the slots arrive on time. They are not.
In January 2026, AWS raised H200 Capacity Block pricing by 15% — the clearest material price increase in current-generation cloud compute after two decades of mostly declining headline cloud pricing.[10] The energised slot is scarce.
II. The Order Book Is Not Demand
A small public sample is enough to make the structural gap visible.
| Project | Announced accelerators | Facility load | Confirmed energised | Status |
|---|---|---|---|---|
| xAI Colossus (Memphis) | ~1,000,000 target; 220,000+ today[1] | ~1.3 GW programme target; ~290 MW current | ~290 MW + 300 MW expansion under build | Operational + expansion |
| Oracle / OpenAI Abilene | 400,000 GB200-class[11] | ~1.2 GW (facility design) | 2 of 8 buildings operational (~300 MW) | Partial |
| DOE Solstice + Equinox | 110,000 Blackwell[12] | ~140 MW | Equinox expected H1 2026 | Planned |
| Meta roadmap 2025–2027 | ”Multi-million” Blackwell/Rubin[2] | Multi-GW | Undisclosed | Roadmap |
Facility loads use disclosed design power where available (Colossus, Abilene) and ~1.3 kW per top-end accelerator otherwise.[13] Appendix C details the arithmetic.
The three projects with disclosed status imply ~2.6 GW of accelerator facility load ordered or planned. The confirmed energised fraction by mid-2026 is closer to 0.6 GW. Roughly a fifth of the ordered facility load in this public sample is plugged in.
That is one half of the gap. The other half is utilisation.
Cast AI’s survey of production Kubernetes clusters across AWS, Azure, and Google Cloud measured an average sustained GPU utilisation of 5%.[14] That is an enterprise figure, not a frontier one — frontier training clusters saturate silicon for weeks at a time. Enterprise fleets provision for bursty demand and idle between requests.
Read the two figures as bounding cases, not a multiplied ratio. They point the same direction. Across the broad surface of paid-for capacity, much of the ordered compute is doing very little economic work. The rest is optionality — a placeholder, a reservation, a hedge against being last in line for a slot. Appendix D sets out the distinction.
The order book is not demand. It is the right to access scarce compute if the workloads arrive.
Capacity announcements are not energised power. A gigawatt of accelerators is a gigawatt of transformers, substations, gas contracts, water permits, and skilled labour, not a tape-out.
Reservations are not utilisation. A reserved instance running idle is a depreciating asset, not a productive one.
Capex is not revenue. A balance-sheet wager that the agentic economy will mature on schedule is not the same thing as the schedule maturing.
What has been priced as a vertical demand curve for compute is a vertical demand curve for optionality on compute. In a bottlenecked world, paying for the right to compute later is strategically correct. The problem is duration mismatch: the option is priced as insurance against compute scarcity, but the underlying asset depreciates against an architectural cadence faster than the workload-maturity clock. Options expire. Blackwell-era chips ordered for 2025 deployment run into Rubin and Rubin Ultra. The chips ordered after that run into Feynman-era roadmap pressure. The grid moves slower than the architecture cadence.
Aggressive AI-infrastructure pricing assumed agentic workflows would convert model capability into labour substitution on a near-term timetable. The deployment rails did not arrive on time. The frontier-lab demand curve is real. The economy-wide curve is real eventually. It is not real in the energisation window of the chip already ordered.
The frameworks are arriving — NIST AI RMF, ISO/IEC 42001, sector-specific eval suites, the open-source agent-testing ecosystem.[15] None mature in eighteen months. The closest analogue is autonomous driving. Level 4 was demonstrated around 2016 — impressive enough to convince many observers broad deployment was a year or two away.[16] A decade later, commercial deployment remains confined to a handful of geofenced cities. Industrial-grade autonomy requires reliability several orders of magnitude beyond the demo. The last few nines take a decade of guardrails, QA, simulation, verification, geofencing, fleet management, and regulatory engagement — not another year of model improvement. Agentic workflows will follow the same arc.
Capability arrives before verification, regulation, and integration. It always has — in medicine, in finance, in software itself.[17]
The revenue data makes the asymmetry precise. Almost all explosive AI demand growth is concentrated in one vertical: software engineering. Cursor, Claude Code, Copilot, Codex, Devin — billion-dollar ARRs in coding. Software engineering spent fifty years building exactly the distribution layer the deployment economy requires: compilers, type systems, linters, version control, test frameworks, CI/CD, sandboxed runtimes, code review, IDEs, and open-source ecosystems with documented APIs. The current revenue is a leading indicator of vertical labour replacement and a trailing indicator of half a century of pre-existing tooling.
Q1 2026 made the asymmetry impossible to miss.
Per reporting across Reuters, Bloomberg, and The Information: OpenAI’s Q1 revenue ran at approximately $5.7 billion against an operating loss of approximately $6.95 billion (-122% margin), tracking toward a $36.6 billion annual loss. ChatGPT weekly active users stalled at 905 million, below the 1 billion target. Sora shut down. The Disney partnership ended. The Walmart pilot shuttered.
xAI lost $2.47 billion on $818 million of Q1 revenue, burning roughly $1 billion a month. Grok did not appear in the top-25 App Store downloads.
Anthropic hit a $30 billion annualised revenue run rate by April — 80× Q1 growth against a 10× target. Claude Code alone crossed $1 billion ARR, on track to print $11 billion of Q2 revenue at ~$600 million of operating profit.
The lab whose distribution layer is most mature is the one converging to profitability fastest. The capability is real at all three. The deployment scaffolding is not.[18]
Within coding, the adoption curve is approaching its top. Late-2025 surveys from Stack Overflow, JetBrains, and GitHub’s Octoverse put professional-developer adoption of AI tools above 70%.[19] The one vertical whose distribution layer exists is saturating in new-user conversion. Further growth comes from deepening usage among existing users — a slower, harder curve.
The order book extrapolates coding’s adoption curve to verticals whose scaffolding does not yet exist.
This is the recurring error of every infrastructure cycle that confuses what is possible with what is deployable today. Railroads in the 1870s. Telegraph in the 1840s. Fibre in 1999. The technology is correctly identified. The timetable is not.
III. The Depreciation Clock
Premium power goes to premium silicon.
A hyperscaler with a scarce power permit and an even scarcer top-tier substation will not run last-generation chips in that slot. Frontier training is increasingly optimised for the current architecture: memory bandwidth, interconnect topology, sparsity support, low-precision throughput, on-package networking. The performance-per-watt of the current generation is engineered against the next model the lab intends to train.
Older chips are too inefficient, in scarce frontier slots, to justify frontier use.
This produces a strange asset class. The world’s most ambitious computing hardware, paid for at full price, becomes economically displaced inside the very facilities built to house it — not because the chips have failed, but because the marginal frontier watt belongs to something newer. Inventory ages. Some of it never reaches a powered rack at all, because the rack is reserved for its successor.
The depreciation clock has three hands.
Architecture cadence. Two-year cycles, each generation delivering multiples of performance per watt on the frontier workloads the labs care about.
Software stack alignment. Frontier kernels, compiler optimisations, library support, and reference implementations migrate to current silicon. The older chip remains supported, but the developer attention moves.
Power contract. A 100 MW interconnection built in 2027 will not be filled with the GPUs ordered for it in 2024 if the GPUs ordered in 2026 do much more work per watt on the workloads that pay for them. The interconnection is the scarce asset. The chip is fungible. The slot accepts the highest-performing tenant.
The three hands rotate together. And yet the chip does not stop being useful when it stops being frontier.
In an earlier era, the chip you bought in 1996 was useless in 2000. That era is over. Moore’s law slowdown has changed the depreciation profile of compute. Per-transistor cost has plateaued. Per-watt performance still improves, but in narrower bands and on specific workloads. A 7 nm chip from 2019 still does floating-point arithmetic perfectly well. A Hopper-class accelerator from 2023 still performs trillions of multiplications per second per device.
What has changed is the marginal economics of frontier training. The newest accelerators are categorically better there, and they will stay that way.
But most of what the economy asks compute to do is not training a trillion-parameter model from scratch. It is inference, embedding, batch processing, fine-tuning, retrieval, ranking, simulation, rendering, robotics control loops, scientific computing. Those workloads do not care which architecture cadence they run on. They care about price per useful operation, latency per query, throughput per dollar, and reliability over time.
A Hopper-era GPU, repriced below the cost basis its first owner needed to recover, is a fantastic inference engine, a fantastic fine-tuning platform, a fantastic simulator. Misallocated frontier hardware becomes correctly priced plateau infrastructure. It loses only one contest: the next billion-dollar frontier training run. Even there, the loss is relative.
The same AI boom that paid for the order book is bidding up the inputs that make new silicon. Epoch AI’s teardown places B200 module production cost at roughly $5,700–7,300, with HBM and advanced packaging together accounting for about two-thirds of variable unit cost.[26] SemiAnalysis estimates memory could rise to ~30% of hyperscaler AI data-centre capex in 2026, up from ~8% in 2023–2024, with HBM undersupplied through 2027.[27] Samsung, SK Hynix, and Micron have diverted capacity toward HBM and high-margin enterprise DRAM, constraining the broader DRAM market and feeding consumer-GPU price inflation. The boom inflates the cost of building new silicon at the moment the per-transistor cost gain has weakened.
The wedge widens from both ends. The plateau gets cheaper as the frontier evicts it. The frontier gets more expensive because the boom has bid up its inputs. Software compounding lifts useful work per dollar on the plateau chip in the middle, every year. On most plateau-eligible economic work, the plateau chip wins that contest by a wider margin every year.
The frontier gets more expensive faster than it gets universally better. The plateau gets cheaper faster than it gets obsolete.
IV. The Backbone and the Last Mile
The right historical analogue is not Pets.com. It is fibre.
Between 1996 and 2001, telecom carriers laid millions of miles of long-haul fibre across the United States. WorldCom, Global Crossing, Qwest, 360networks, Williams Communications — the capital structures collapsed almost entirely. By the early 2000s, a vast share of installed long-haul fibre was dark. The cost basis of the asset was crushed against the wall of cancelled customer demand.
And then it became the spine of the internet.
But it took twenty years.
The bottleneck was never the backbone. The backbone was finished by 2001. The bottleneck was distribution — the last copper mile between the optical trunk and the household, dial-up at kilobit speeds, DSL at megabit speeds, then cable DOCSIS, fibre-to-the-curb, fibre-to-the-home, 3G, 4G, 5G.
Bandwidth at the trunk was abundant from the day the fibre was lit. Bandwidth at the wall socket arrived a generation later, in layers, against the slow economics of digging up streets and convincing households to pay for something they had not previously known they wanted. The optical fibre did its job in eighteen months. The distribution layer took two decades.
Netflix streams on that fibre. YouTube serves on that fibre. AWS replicates across it. The applications were waiting on the last mile.
The dot-com bust was correct ideas on the wrong timetable, financed at the wrong duration, against an underbuilt distribution layer.
AI is building the same shape.
The compute backbone is growing at unprecedented scale: hyperscaler campuses, sovereign clusters, megaprojects in Texas, Wyoming, the UAE, Saudi Arabia, Korea, Japan. The accelerator order books reach years into the future.
The models are capable. The distribution layer is not.
In AI, the distribution layer is reliability engineering, verification, observability, permissions, audit, rollback; agentic harnesses, distillation, quantisation, and the optimisations that make smaller models perform like larger ones on commodity hardware; integration into industry software stacks, regulatory regimes, procurement, insurance, liability, and workflow ownership. A frontier training run produces an input. Turning that input into intelligence the economy can apply is the distribution problem.
Coding got there first because coding has compile, lint, test, and revert built in by default. Every output is checkable in seconds against a deterministic substrate. Most other domains do not have that. They build their distribution layer from scratch, against domain-specific failure modes, under regulatory regimes that did not contemplate autonomous agents, inside organisations whose change-management cycles measure in years.
In the meantime, the backbone is being financed at duration assumptions the distribution layer cannot meet.
The reset may propagate more quietly than the fibre crash. Hyperscaler balance sheets are larger than the telecom carriers’ were. Internal cascading absorbs more of the inventory before it ever reaches an external market. Sovereign and enterprise buyers can hold compute on long horizons without forced sale.
The AI cycle may not need WorldComs and Global Crossings to deliver the same outcome. The shape is the same: backbone overbuilt against an underbuilt distribution layer, capacity repriced as the timetable resets, infrastructure that no first owner can monetise becoming the substrate the next owner can.
Bubbles do not destroy infrastructure. They destroy the capital structures that paid too much for it.
V. The Repricing Mechanism
The important event is migration.
The mechanism is simpler than a financial crisis. Hyperscalers have a finite number of premium power slots, and the marginal economics force them to fill those slots with the newest hardware. Last-generation chips do not have to fail commercially to lose their place. They only have to fall behind the per-watt economics of the next generation on the workloads the slot was built for.
That eviction is the trigger. What happens after is where the abundance lives.
The compute flows down the slot hierarchy through several channels at once. The dominant and quietest is the hyperscaler-internal cascade: chips bought for frontier training drop to internal inference, then to ad ranking, then to batch work. The same silicon quietly serves different economics inside the same firm, year by year. It shows up only as falling unit prices in cost-per-token guidance and cheaper hyperscaler inference SKUs.
The redistribution does not always wait for the bust. In May 2026, Anthropic took over the entire ~300 MW capacity of Colossus 1 — the Memphis facility built around 220,000+ Nvidia processors — under a SpaceX S-1 disclosed agreement at $1.25 billion per month through May 2029, with reduced ramp fees and a 90-day termination right for either party.[20] xAI shifted internal frontier training to Colossus 2 in Southaven, Mississippi, deploying next-generation GB200 Blackwell. The migration mechanism is running on premium current-generation silicon: even fully energised frontier hardware reallocates as soon as another tenant can put it to higher-value use. The cascade starts at the top.
The reverse beat is also visible. The planned expansion of the Oracle/OpenAI Stargate campus in Abilene from 1.2 GW to 2.0 GW collapsed in early 2026 against financing complexity, revised OpenAI demand forecasts, and a winter weather event that took the liquid-cooling loop offline for several days. Microsoft took over the adjacent expansion with Crusoe, adding two buildings and an on-site 900 MW gas plant alongside Oracle’s existing 350 MW backup. Even a fully designed, partly built premium slot can be reassigned when the original buyer’s economics no longer pencil out.[21]
Beyond the hyperscaler tier, the institutional layer absorbs another wave: tier-2 colocation, regional clouds, sovereign clusters, enterprise on-premise, university research, industrial sites with cheap local power, and crypto-era datacentres rebuilt for a different workload. The frontier is concentrated. The plateau is distributed. Old hyperscaler racks find new homes in lower-priority slots — never the scarcest watts, but the second-scarcest, the third, the tenth, the thousandth.
Underneath the institutional layer is the edge: a workstation in a research lab, a server in a small business, a single accelerator in a household. The slow-Moore cushion plus algorithmic deflation means a last-generation accelerator running a distilled, quantised model is a capable assistant on a personal machine. The edge will not replace hyperscaler inference for latency-critical, regulated, or high-reliability workloads. It does not need to. Its role is to absorb cheap, distributed computation where price matters more than perfect uptime and the marginal alternative is no compute at all.
The Slot Hierarchy
Frontier slot — newest accelerators, premium permitted power, largest training runs. Few sites. Enormous capex. Capability discovery.
Plateau slot — previous-generation accelerators, cheaper power, inference, fine-tuning, simulation. Many sites. Modest capex per site. Capability diffusion.
Edge slot — laptops, phones, embedded GPUs, local agents. Near-zero marginal cost. Ubiquitous. Capability access.
A second axis runs through the slot hierarchy.
Inside the plateau tier, hyperscaler public cloud is the premium access layer. The spread of up to 10× between hyperscaler on-demand H100 SKUs and specialist clouds delivering the same hardware (Appendix J) is not a depreciation curve. It is a slot-rent curve. The chip is identical. The premium is for trust, energised slot, procurement, immediate availability, and enterprise support.
That premium is durable for one class of customer and a tax for another. The boom phase had one scarce unit: access to any serious GPU capacity. Hyperscalers dominated it. The deployment phase has a different scarce unit: verified intelligence inside a workflow at acceptable cost. That lives wherever regional cloud, sovereign cluster, second-tier colocation, enterprise rack, or industrial site provides verified compute at the lowest fully-burdened cost.
Hyperscalers may retain revenue share while losing workload share. The cloud keeps the frontier. The plateau escapes the cloud.
The migration is not old chip replaces new chip in the same scarce slot — the new chip wins wherever the slot is scarce. The migration is old chip discovers a different slot. Frontier scarcity is contiguous high-density power: hundreds of megawatts to gigawatts behind a single substation, with the cooling and operator depth for synchronous training. Plateau abundance is fragmented low-density power: regional colocation, sovereign and university clusters, industrial behind-the-meter capacity, commercial buildings, household nodes. At full chip price, a new accelerator cannot justify those slots. Written down, an older one can. Appendix G works through the arithmetic.
The financial damage of the bust is real but secondary. Some neocloud operators levered against five-year PPAs and six-month customer books do not survive. Some equity gets wiped. Some debt gets restructured. The losses fall hardest on the investors who underwrote the cycle at its peak. The structural event, underneath, is migration. The silicon does not disappear. It changes hands, changes facility, changes country, and changes scale — from concentrated frontier campuses to distributed plateau and edge slots that did not previously have access to compute at this price.
The migration does not have to appear on a second-hand market to count. Internal cascade, institutional absorption, edge fabric — each shows up most often as a falling unit price on a billing page, not a fire sale at auction. The physical chip may never leave the hyperscaler that ordered it. The abundance arrives anyway.
The first buyer takes the loss. The second buyer takes the asset.
The mechanism has been operating across every prior datacentre accelerator generation. T4 at year 8 still serves production inference across AWS and GCP at a fraction of its launch price. A100 at year 6 carries the bulk of enterprise inference at heavily depreciated cloud-SKU pricing. H100 at year 3–4 already trades at higher memory-bandwidth-per-dollar than new B200 retail. The bust does not create the cascade. It scales one that has been continuously operating since at least 2017. Appendix I works through the empirical record.
In every prior infrastructure boom this is how the abundance arrived. Canals in Britain. Railroads in the United States. Optical fibre at the turn of the millennium. The infrastructure outlives the company that built it. The cost basis resets. The customer who could not afford the asset at the asking price finds it accessible at the recovery price. The customer who could not access it at all — because it was concentrated in a few hands at a few sites — finds it nearby.
Liquidation makes headlines. Redistribution changes the world.
VI. Stockfish for Everything
The maximalist version of the AI story imagines one enormous model doing everything. A country of geniuses in a data centre. A digital workforce replacing labour wholesale.
Stockfish is the strongest chess engine in human history. It does not win by running the largest neural network ever trained. It wins because thirty years of engineering have produced a system in which a small neural network — the NNUE evaluation — sits inside ruthlessly optimised search code, transposition tables, opening books, endgame tablebases, move-ordering heuristics, pruning techniques, and a testing harness so disciplined that no patch lands without statistical proof of improvement. The intelligence is in the assembly.
Modern Stockfish now outperforms the original AlphaZero-style benchmark systems. A small NNUE network wrapped in engineered scaffolding, running on commodity CPUs, beats a much larger pure-neural approach that required substantial GPU compute just to play. The bounded engineered system outperforms the raw neural network. That is the shape of plateau intelligence.
Stockfish matters here not because chess resembles the economy, but because it shows how useful intelligence becomes reliable: by embedding learned judgement inside an engineered system that constrains, tests, and corrects it.
Most economically useful AI will look the same.
The same arc — imitation learning, then reinforcement learning, then engineered environments — has played out three times in three different domains.
Game-playing AI ran it through the 2010s, ending in Stockfish-class systems: a small NNUE network wrapped in three decades of search, tablebases, and a statistical testing harness, outperforming pure-RL approaches on commodity CPUs.
Self-driving ran it through the 2010s and 2020s, ending in Waymo’s L4 stack: specialised perception networks wrapped in HD maps, geofencing, hand-engineered planners, redundant safety layers, and simulation running billions of synthetic miles. The wrapped-and-bounded approach has outpaced pure end-to-end neural driving by a decade in commercial deployment.
Large language models are running it now: imitation at pre-training scale (GPT-2, GPT-3), then RLHF and verifiable-reward RL (o1, R1), and now the environments phase — Cursor, Claude Code, Operator, Deep Research, AlphaProof, robotic foundation models — wrapping frontier models inside sandboxed execution, tool use, retrieval, verification, and rollback.
Three domains, one shape: learned components inside engineered scaffolding, outperforming the unbounded learned system on production tasks. The Stockfish architecture is what deployable AI already looks like.
The economy does not need one god-model. It needs a million Stockfishes: bounded engines where learned judgement is empowered by rules, tests, logs, permissions, and rollback.
The electrification cycle did not produce one universal electric device. It produced thousands of appliances — refrigerator, washing machine, factory motor, lightbulb, oven — each a bounded engineered system that turned generic grid power into specific economic value. The AI cycle is producing the same shape, in software.
A frontier model is the evaluation function. The system that surrounds it — search, verification, tool use, permissions, retrieval, testing, rollback, audit log — is the chess engine. Without that scaffolding, the model is a powerful piece evaluator playing without rules. With it, the model becomes an engineered agent: bounded, testable, verifiable, deployable.
The model is the evaluator. Capability per token. Powerful, expensive, improving.
The scaffolding is the engine. Search, verification, tool use, permissions, observability, rollback. Mostly conventional software. The hard part of deployable agentic intelligence.
The workflow is the game. Legal review, customer support, accounting close, materials design, surgical scheduling — each with its own rules, scoring function, and acceptable failure modes.
The diffusion is the network of engines. Not one god-model. Thousands of bounded agents, each specialised, each verified, each cheap to run.
This is what the migrated infrastructure will host: smaller, distilled, specialised models running on plateau-tier accelerators, wrapped in heavy verification software, deployed against narrow but valuable workflows. Not a substitute for human judgement. A substrate for it.
The frontier labs will keep training larger models on the newest hardware. They are correct to. But capability deployment — the diffusion of intelligence into the ordinary machinery of civilisation — happens on the cheaper, older, more plentiful tier, with smaller models distilled out of the frontier and re-engineered for industrial use.
The frontier is the laboratory. The plateau is the economy.
An allocation principle follows. Frontier cognition is the scarcest cognitive resource humanity has produced. Spending it on tasks that do not require it is the cognitive equivalent of running a supercomputer to keep a calculator app on screen. Frontier intelligence discovers, judges, and creates. Plateau intelligence executes, repeats, monitors, and verifies.
The second law of this dynamic is just as important as the first.
The cost of running a given level of intelligence on a given piece of hardware falls every year, not because the hardware changes, but because the software does. Distillation, quantisation, sparsity, speculative decoding, KV-cache compression, mixture-of-experts routing, FlashAttention, compiler advances — each compounds on the others. Measured at cost-per-token for a given capability level, the effect is orders of magnitude.[22]
The same chip, two years later, runs better software, against smaller models, with cheaper inference, on a broader set of workloads. The depreciation curve and the algorithm curve run in opposite directions. The first owner carries the hardware loss. The second owner inherits the algorithmic gain.
They diverge from the moment the chip ships.
Hardware depreciates. Algorithms compound.
This has no analogue in prior industrial capital. A bridge does not become a better bridge while it waits to be opened. A length of fibre does not become more capable while it sits unlit. An AI accelerator does. The chip on the pallet, waiting two or three years for a power connection, is not depreciating in capability. It is incubating. The hardware sits still. The substrate it will eventually run on compounds around it.
Project that forward against a glut of Hopper-class and Blackwell-class accelerators — some on the second-hand market, some inside the hyperscalers that bought them but reassigned to non-frontier workloads — all benefiting from the same compounding software stack. Inference prices fall sharply on hyperscaler SKUs and on the open market alike. Capability per dollar continues to climb. Appendix H derives the underlying four-curve identity.
The frontier remains expensive. The plateau gets cheaper. The same silicon does more work each year it sits there.
The deepest mistake of the first capital cycle was not buying too much silicon. It was assuming silicon alone would deliver abundance. Cheap compute is necessary; it is not abundance. Abundance is what happens when repriced compute meets three other layers: a software stack that lets smaller models do bigger work, engineered scaffolding that constrains those models to verifiable envelopes, and institutional verification frameworks — evals, audit, rollback, regulatory acceptance — that let organisations deploy AI without losing the right to operate. The bubble paid for the first. The deployment phase pays for the other three.
The deployment timeline implied by this architecture is wave-shaped rather than terminal. Each economically distinct workload crosses its own reliability bar at a different point on the base-capability curve, lifted by the Stockfish multiplier of the scaffolding wrapped around it. Appendix L quantifies the lattice, the multiplier identity, and the resulting diffusion cadence across 2026–2040.
VII. The Asymmetry
Time does two opposite things to AI hardware at once.
For the first owner, time destroys value. The chip misses its frontier slot, gets overtaken by the next generation, becomes economically wrong for the high-density power site it was meant to occupy.
For the second owner, time creates value. The same chip runs distilled models, cheaper inference stacks, better kernels, and domain-specific workflows that did not exist when it was manufactured. The deployment layer, in those intervening years, learned how to extract more intelligence from it.
The same delay that makes the chip a financial failure for frontier training makes it a more powerful substrate for plateau intelligence.
The first owner’s loss is the second owner’s gain. The chip is more useful for the wait.
The asymmetry has a limit. Plateau hardware does not win everywhere. It wins where the workload threshold is met, the model fits inside the older silicon’s memory footprint, latency is acceptable, and power is cheap enough that operating margin survives. The frontier retains the workloads where capability appetite remains uncapped: the largest training runs, the most demanding inference latencies, the models that exceed the previous generation’s VRAM ceiling, the deployments behind premium-priced gas turbines where every watt has to earn its keep. The cascade is workload-specific, not universal — which is why the slot hierarchy of §V and the binding-dimension tables of Appendix H are structured around workload-specific thresholds rather than a global comparison.
A still deeper asymmetry follows. The substrate compounds in capability and falls in price faster than institutions can metabolise intelligence at any fixed level. Once a model is good enough for contract review, tutoring, diagnosis support, software analysis, scheduling, monitoring, or simulation, the deployment layer needs years to build the verification, liability, procurement, and trust infrastructure required to use it. The substrate keeps pushing that same level of capability down the cost curve by 10×, 100×, 1,000×.
The world is not running out of intelligence. It is learning how to metabolise it more slowly than the substrate is learning how to produce it.
This is the intelligence endowment. At every threshold of useful capability, intelligence falls toward ambient cost before the deployment layer has finished absorbing the previous wave. The endowment is practically inexhaustible — not because the chips are unlimited, but because the intelligence the existing chips produce keeps getting cheaper faster than institutions can use it.
This is why the frontier appears to capture all the value. Frontier capability is scarce, and the market prices scarcity. The chips, the labs, the salaries, the rounds, the demos — all visibly concentrated at the leading edge. But the frontier captures only the discovery rents. The plateau captures the durable surplus.
Value looks like it accrues at the frontier because price measures scarcity. Wealth accrues at the plateau because civilisation is built from abundance.
The frontier captures attention. The plateau captures civilisation.
VIII. The Intelligence Layer
The cost of computation has never had a stable demand curve.
Every prior collapse in compute price — mainframe to minicomputer, PC to cloud, cloud to mobile — produced not a smaller market for the older tier but a vastly larger market overall, organised around use cases that were uneconomic at the previous price. Cheaper bandwidth did not produce a smaller telecoms industry. It produced streaming, social networks, cloud, SaaS, video calls, online gaming, the mobile internet.
When the price of a plateau slot collapses, three things happen.
Things once scheduled become continuous. Climate models. Materials search. Robotics simulation. Drug-candidate screening. What used to be a study becomes a query. What used to be a query becomes a background process.
Things once centralised become local. Inference on every laptop and phone, with full privacy and zero latency. Personal agents on the edge of the user, not the edge of the cloud. Translation, transcription, summarisation, retrieval — on the device, not on a server.
Things once expert-only become ambient. Legal review against every contract. Diagnostic second opinions for every imaging study. Tutoring at the marginal cost of a kilowatt-hour. Industrial optimisation for every shop floor.
The demand for computation is not fixed. It is discovered by lowering the price.
The phrase used in Create an Age of Wonders is intelligence as infrastructure.[23] The post-bust world is what that phrase means in practice.
An intelligence layer is a property of the substrate. Bandwidth became a property of the substrate around 2010 — we stopped designing applications around “is the user online?” and started designing them around the assumption that they are. Electricity became a property of the substrate in the 1930s. Refrigeration. Plumbing. Roads. Each started as an event — a thing you visited, paid for, scheduled — and became infrastructure: present, cheap, taken for granted.
Compute is in the middle of that transition. Intelligence is following it.
The shape of the post-bust intelligence layer is already visible. Smaller models embedded inside ordinary software. Local agents on laptops and phones. Open-weights ecosystems running on commodity hardware. Specialist models for medical imaging, legal search, code review, customer support, materials design, scientific writing, and a thousand other narrow tasks. Each cheap. Each bounded. Each verifiable. Each running on a substrate that, at the margin, costs almost nothing.
The frontier labs will still exist. The frontier models will still be expensive. The capability gradient will keep climbing. But the deployment surface, where the economic work gets done, will be the cheap, plentiful, diffused tier. The same shape as electricity. The expensive part is the grid. The cheap part is the outlet. And the outlet is only valuable when something is plugged into it.
The scarce thing, at every layer, is the integrated slot: power, cooling, interconnect, permits, verification, trust, and a workflow ready to absorb the output.
The bubble overbuilt raw compute on the assumption that deployment, verification, trust, and workflow ownership would arrive at the same speed. The bust corrects the timetable. The infrastructure persists.
Carlota Perez named the pattern. The installation phase is when capital floods into a new infrastructure faster than society can reorganise around it. The financial crisis arrives when the timetable proves wrong. The deployment phase follows, decade-long, on the infrastructure left behind. Canals, railroads, electricity, highways, fibre. Each produced a financial event that obscured an infrastructural arrival. The arrival is what mattered.
Paul David showed it took American factories forty years after electrification to put a motor on every machine — and the productivity payoff arrived only when they did.[38] The AI cycle is at the start of the same long arc, on a compressed clock.
The bubble is the installation phase of the intelligence grid.
Abundance is not infinity. Compute will still consume energy, occupy land, require cooling, depend on supply chains. Abundance means a collapse in marginal constraint: the point at which something once rationed becomes ordinary enough to build on without asking permission each time.
The bubble promised a country of geniuses in a data centre.
The abundance era will look like intelligence in the walls: in the spreadsheet, the router, the factory scheduler, the medical scanner, the law office, the classroom, the phone, the machine tool, the weather model, the lab bench.
The path from here to there runs through the crisis, not around it.
The AI bubble will not decide whether intelligence matters. It will decide who paid the first, stupidly expensive price for making intelligence cheap.
The first AI capital cycle may end in disappointment. In infrastructure cycles, disappointment is how the cost basis resets.
The first financiers inherit the loss.
Civilisation inherits computation cheap enough to diffuse through everything.
Technical Appendix
Key arithmetic and supporting data behind the quantitative claims in the essay body. Inline citations map to the reference section below.
A. AI Capex Aggregate, 2025–2026
| Source | Figure | Period | Notes |
|---|---|---|---|
| Hyperscaler AI/cloud capex guidance (Microsoft, Alphabet, Meta, Amazon, Oracle, CoreWeave) | >$700B aggregate | FY2026 plans | Industry sum from Reuters aggregation across hyperscaler guidance; Moody’s projects ~$820B in FY2027.[3] |
| Hyperscaler corporate bond issuance | $121B (2025) → $175B projected (2026) | Annual | Top-five U.S. hyperscalers; ~4× historical annual average. Amazon’s $54B March 2026 issuance was the company’s largest ever debt transaction. Bank of America projection for 2026.[4] |
| AI venture funding | ~$255B | Single quarter (Q1 2026) | PitchBook tracked quarterly deployment, exceeding the full 2025 total. Dominated by OpenAI ($122B), Anthropic ($30B), and xAI ($20B) financings; excluding the top-five transactions, broader quarterly deal value contracts by 73%.[5] |
| U.S. data centre investment | $45.7B PE | FY2025 | S&P Global Market Intelligence; PE deployment represented 72% of $63.35B total sector investment.[6] |
The figures are not comparable line items (capex, debt issuance, venture funding, and private-equity deployment are different cash flows), but together they describe the scale and structural mutation of finance directed at the AI infrastructure stack. The bond-issuance line in particular signals a regime change: historically self-funding technology platforms have shifted to external debt markets as organic free cash flow has compressed under physical buildout. Amazon’s trailing-twelve-month free cash flow fell roughly 95% to $1.2 billion in the same window it issued $54 billion in bonds. Meta issued $30 billion in October 2025; Alphabet issued $32 billion in February 2026 including a rare 100-year bond; Oracle planned $25 billion for 2026 against an upward-revised $50 billion FY2026 capex guide.
B. Energy Arithmetic — Fuel-Only Marginal Cost
Using the U.S. Energy Information Administration’s 2024 average operating heat rate for natural gas-fired generation (7,754 Btu/kWh)[9] and front-month gas prices in May 2026:[8]
| Hub | Front-month gas | Fuel-only generation cost | Multiple vs. Henry Hub |
|---|---|---|---|
| Henry Hub (U.S.) | $2.83/MMBtu | ~$22/MWh | 1.0× |
| TTF (Europe) | ~$17/MMBtu | ~$132/MWh | ~6.0× |
| JKM (Asia) | ~$18.80/MMBtu | ~$146/MWh | ~6.6× |
For a 100 MW continuously loaded campus (8,760 hours per year at full load = 876,000 MWh annual draw), the fuel-only marginal-cost spread between Henry Hub and TTF is approximately $96M/year. This excludes transmission, capacity charges, cooling overhead, financing, and carbon costs — each of which compounds the regional disadvantage further. The spread is the lower bound on the fuel-only delta between regions; it is not a full delivered-power cost comparison.
C. Named-Project Facility-Load Estimates
| Project | Reported scale | Facility load | Confirmed energised |
|---|---|---|---|
| xAI Colossus 1 | 220,000+ Nvidia processors[1] | ~290 MW (consistent with reported facility power) | ~290 MW; 300 MW expansion under build |
| xAI Colossus programme | ~1,000,000 target | ~1.3 GW target (heuristic) | Partially built |
| Oracle / OpenAI Abilene | 400,000 GB200-class[11] | ~1.2 GW (facility design, per Oracle / Crusoe disclosure) | 2 of 8 buildings operational (~300 MW) |
| DOE Solstice + Equinox | 110,000 Blackwell[12] | ~140 MW (heuristic) | Equinox planned H1 2026 |
| Meta roadmap 2025–2027 | ”Multi-million” Blackwell/Rubin[2] | Multi-GW | Undisclosed |
The heuristic of ~1.3 kW of all-in facility load per top-end accelerator[13] is an order-of-magnitude estimate. Where projects have disclosed permitted facility design power (Colossus 1, Abilene), that figure is used instead. The point of the table is the gap between announced and energised, not the precise kW-per-chip multiplier.
Summed across the three projects with disclosed status (Colossus ~1.3 GW + Abilene 1.2 GW + DOE 0.14 GW), the order book implies ~2.6 GW of facility load. The mid-2026 energised share — Colossus 1 (~290 MW) plus Abilene 2-of-8 (~300 MW) — totals ~0.6 GW. Roughly a fifth of disclosed ordered load is plugged in.
D. Bounding Cases — Energisation and Utilisation
The roughly 21% energised share and the Cast AI 5% utilisation share are not multiplied together in the body. They describe different stages of the compute pipeline:
- Energisation share describes ordered facility load that has been turned on. Source: public sample of named hyperscaler and sovereign projects in Appendix C. Roughly a fifth of disclosed ordered load in the sample is plugged in.
- Utilisation share describes the time-fraction of an active GPU spent on productive work. Source: Cast AI 2026 Kubernetes survey across AWS, Azure, and GCP enterprise clusters.[14] Approximately 5% mean sustained.
Frontier training clusters at the labs almost certainly run far higher utilisation than enterprise cloud fleets. The named hyperscaler projects also represent only a subset of total accelerator orders. Treat the two figures as bounding cases rather than a unified ratio. Together they describe a real and material gap between paid-for capacity and economically productive capacity.
The Cast AI 2026 State of Kubernetes Optimization Report shows the broader pattern: 8% average sustained CPU utilisation, 20% memory utilisation, 69% CPU overprovisioning headroom, 79% memory overprovisioning headroom, and fewer than 2% of GPU workloads on spot instances — primarily because spot capacity was effectively unavailable at hyperscaler scale. Specialised clusters using GPU time-slicing and MIG partitioning sustain up to 49% GPU utilisation, suggesting the binding constraint is workflow design, not chip capability.
E. Cost-Per-Token Decline
Stanford HAI AI Index and Epoch AI compute-cost time series document a roughly 10× annual decline in inference cost at fixed capability level across the 2022–2026 window.[22]
GPT-3.5-equivalent capability cost ~$20 per million tokens in late 2022 and ~$0.07 per million tokens in late 2024 — a roughly 280× reduction over two years. GPT-4 launched at $30/$60 per million input/output tokens; GPT-4o offered the same capability tier at $3/$10 per million eighteen months later. DeepSeek V3 entered at $0.14 per million input tokens, undercutting GPT-4o by ~95%. Epoch AI reports a median annual decline of ~50× across the inference market.
The decline is attributable to hardware improvement, algorithmic efficiency (quantisation, distillation, sparsity, speculative decoding, KV-cache optimisation, mixture-of-experts routing, FlashAttention, and compiler advances), competitive pricing from open-weights models, and software stack maturity. The hardware contribution is much smaller than the headline rate; the software-and-competition contribution is dominant.
F. The Grid-Avoidance Tax
The seven-year European grid interconnection queue[7] and similar U.S. backlogs have driven hyperscalers and developers toward behind-the-meter generation: modular, on-site, gas-fired power deployed directly alongside data centres. The canonical 2026 instance is the $1 billion strategic investment by Blackstone Tactical Opportunities and Halliburton into VoltaGrid, supporting a 7.5 GW forward order book of modular natural-gas systems through 2030 and manufacturing capacity sized for up to 300 MW per month of reciprocating engines and turbines.[24] The pitch: skip the queue, install power in months rather than years, operate under simplified minor-source air permits.
The pitch is real. So is the thermodynamic cost.
Thermal efficiency of a generator is the ratio of its electrical output to fuel input, conventionally expressed via the operating heat rate (Btu of fuel per kWh of electricity):
Using the U.S. EIA’s 2024 average tested heat rates by prime mover:[9]
| Generation class | Heat rate (Btu/kWh) | Thermal efficiency | Gas consumption (ft³/kWh) |
|---|---|---|---|
| Combined-cycle utility gas (best in class) | 7,548 | ~45.2% | ~7.35 |
| Average U.S. grid natural gas | 7,754 | ~44.0% | ~7.55 |
| Reciprocating internal-combustion gas | 8,924 | ~38.2% | ~8.69 |
| Simple-cycle gas turbine | 10,999 | ~31.0% | ~10.71 |
Gas consumption assumes approximately 1,027 Btu per cubic foot of pipeline natural gas.
The combined-cycle utility plant captures secondary heat through a steam turbine; behind-the-meter simple-cycle and reciprocating units do not, because the marginal payoff for captured heat does not justify the deployment delay and the engineering complexity at modular scale. Per kWh of electricity delivered, a simple-cycle gas turbine consumes roughly 45.7% more natural gas than a best-in-class combined-cycle plant — and emits CO₂ in direct proportion.
That 45.7% markup is the grid-avoidance tax. The chip is the same; the slot is the same; the workload is the same. The difference is the thermodynamic penalty paid to install power in months rather than years.
The tax compounds the regional cost spreads of Appendix B. In LNG-import regions, where the underlying fuel is already six or seven times the Henry Hub price, a behind-the-meter simple-cycle installation can run at roughly nine to ten times the fuel-only marginal cost of a Henry Hub combined-cycle baseline. Compute is a geography, and the cost of escaping the geography is itself a tax. This is the operational lower bound on the slot-scarcity rent: the marginal slot is not merely expensive; the most readily available substitutes for it are physically inefficient.
G. Slot Density, the OpEx Paradox, and the Migration Path
The body argues that compute migrates down a slot hierarchy: frontier → plateau → edge. A natural objection is that if energised power slots are universally scarce, even chips with zero remaining CapEx will be displaced by next-generation silicon in every slot they could occupy. The migration would collapse before it started; older hardware would become e-waste rather than redistributed asset.
Call this the OpEx vs slot paradox: at $0 CapEx, an older chip still carries an opportunity cost equal to the next-generation chip that could occupy the same slot. If slots are uniformly bottlenecked, the opportunity cost is uniformly high, and the rational operator runs the most efficient chip available everywhere. Migration is irrational; the older hardware is shredded.
The paradox resolves once you notice that slot scarcity is not uniform.
What is scarce is contiguous high-density power. Single permitted sites delivering hundreds of megawatts to gigawatts behind a single substation, with the cooling, networking depth, and operator capability to absorb a synchronous training run. Those sites are bottlenecked by transformer queues, substation permits, multi-year grid-interconnect cycles, and the institutional capability to operate a hyperscale facility. There are very few in the world. Each one is intensely contested.
What is not scarce is decentralised low-density energised capacity. Tier-2 colocation facilities with hundreds of kilowatts of spare power. Sovereign and university clusters. Commercial buildings with industrial HVAC and stable kilowatt-class server rooms. Industrial sites with behind-the-meter generation. Office buildings, research labs, household nodes — millions of permitted, energised, underutilised electrical slots scattered through the built world. The grid does not have to be expanded to use them. They already exist.
| Slot tier | Power density | Site count (order of magnitude) | Workload type | Binding constraint |
|---|---|---|---|---|
| Frontier | 100 MW – 1 GW contiguous | tens to low hundreds globally | Synchronous training | Substations, transformer queues, multi-year permitting |
| Plateau | 1 MW – 100 MW distributed | thousands | Inference, fine-tuning, batch, simulation | Existing colocation, sovereign, and enterprise capacity |
| Edge | 0.5 kW – 1 MW fragmented | millions | Local inference, agents, batch | Wall socket, room cooling, residential grid |
These decentralised slots are currently empty of frontier compute not because they cannot host it, but because a new chip carrying a $30,000–$40,000 cost basis requires 90%+ utilisation at premium hyperscaler economics to justify the purchase. A brand-new accelerator running at 15% utilisation in a 50 kW office slot is a CapEx disaster.
Once the chip’s CapEx is written down toward zero, that math flips. A depreciated chip running at 20% utilisation in a 50 kW slot pays back trivially. The hardware becomes economic at utilisations and densities that were impossible at full price. The migration is not chip-replaces-chip in the same scarce slot. It is chip-discovers-new-slot in the previously-uneconomic decentralised tier.
A further mechanism aggregates these decentralised slots into useful continuous capacity. Intermittent compute distributed across time zones can be networked into a follow-the-sun virtual cloud: workloads route to whichever region is currently sitting on cheap energy. Inference, fine-tuning, batch processing, and simulation are latency-tolerant and do not require the synchronous gigawatt-class interconnects that frontier training requires.
This is the architecture of plateau-tier intelligence: hyper-cheap silicon harvesting the latent energy margins of the built world, aggregated temporally to deliver continuous utility on the latency-tolerant workloads that constitute the bulk of economic compute demand.
The deployment phase will have its own binding constraints — verification overhead in decentralised multi-tenant networks, WAN bandwidth for cross-region pipeline parallelism — but the thesis does not require these to be solved today. It requires only that the existence of hyper-capable, hyper-cheap silicon plus a vast tier of underutilised decentralised slots creates the economic incentive to solve them.
The OpEx paradox is real, but it is a paradox only in the hyperscaler frame. Once the frame widens to include the entire built world’s latent kilowatts, the paradox dissolves into a migration path.
H. The Plateau Crossover
The body argues that AI hardware depreciates out of frontier use long before it depreciates out of economic usefulness. This appendix formalises that claim in two stages: first as a CapEx crossover on the capital-cost side, then as a fuller economic crossover that brings in operating costs, architectural compatibility, and the workflow-discovery recursion. The headline thesis the appendix supports:
AI hardware depreciates against the frontier faster than it depreciates against usefulness. When capital write-down exceeds frontier improvement on a workload’s binding dimension, and when software progress keeps lowering the capability threshold, previous-generation accelerators become the natural substrate for plateau intelligence.
Definitions. The core dynamics involve four rates of change — frontier hardware growth (), software efficiency (), capital cost decay (), and workflow-discovery recursion () — plus one architectural correction factor ():
| Symbol | Quantity | Empirical value | Source |
|---|---|---|---|
| Annual frontier hardware capability growth per chip, on the binding dimension (FLOPS/chip, GB/s/chip, FLOPS/W, etc., not capability-per-dollar) | Per-chip values from H100 SXM5 → B200 SXM, 3-yr annualised: ~1.31× (FP16/BF16 throughput); ~1.34× (memory bandwidth); ~1.34× (memory capacity, 80→192 GB); ~1.47× (FLOPS/W); ~1.65× (FP8 throughput) | Nvidia published specs; per-chip values, not per-dollar | |
| Annual software-efficiency multiplier at fixed capability | ~50× (inference cost-per-token); ~3× (pre-training compute-per-capability) | Stanford HAI AI Index; Epoch AI | |
| Annual secondary-market capital cost decay | ~0.25 (used H100 SXM5: ~$35K new 2023 → ~$15K used mid-2026) | Industry pricing | |
| Architectural divergence penalty (software gains hard-coded to new silicon that does not backport cleanly) | Modest for FP16/FP8 inference; significant for FP4 and the latest attention/decoding kernels | ISA differences | |
| Workflow-discovery recursion rate (AI-assisted engineering productivity feedback) | Empirically positive; bounded above by deployment friction | This essay |
The critical convention: is physical capability per chip on the binding dimension. It is not capability per dollar. Price enters separately through and . Different workloads bind on different dimensions, so the relevant is workload-specific. For cross-reference, Epoch AI tracks long-run FLOP/s per dollar at ~1.37×/yr; that figure is the combined effect of per-chip growth () and launch-price growth (), and is not used as a per-chip value in the identity below.
The CapEx Crossover Identity. Let denote useful capability per dollar of a chip purchased used at time , with age :
where is the chip’s physical capability on the binding dimension (fixed at manufacture) and is the contemporary software multiplier. A new frontier chip at time — assuming approximately stable launch prices across generations (the correction is treated below) — has capability and price :
The software multiplier benefits both equally and cancels in the ratio. The structural factor — the CapEx Crossover Identity — is:
When , plateau hardware delivers more useful capability per dollar than frontier hardware on the binding dimension, with an exponentially growing advantage in . This is a CapEx-only object. Operating costs enter in §H.5 below.
The threshold condition.
The secondary market must reprice faster than per-chip frontier capability advances on the dimension at hand. The condition is dimension-specific:
| Binding capability dimension (per chip, H100→B200, 3-yr annualised) | Threshold | Observed | Identity fires? | |
|---|---|---|---|---|
| FP16/BF16 throughput per chip | 1.31 | 0.24 | 0.25 | Yes (marginal) |
| Memory bandwidth per chip | 1.34 | 0.25 | 0.25 | Yes (marginal) |
| Memory capacity per chip | 1.34 | 0.25 | 0.25 | Yes (marginal) |
| FLOPS per Watt per chip | 1.47 | 0.32 | 0.25 | No |
| FP8 throughput per chip | 1.65 | 0.39 | 0.25 | No |
The identity fires marginally on three of the five per-chip dimensions — bandwidth, capacity, and FP16/BF16 throughput — and fails on the two dimensions where Blackwell made the largest architectural jump (FP8, FLOPS/W). This is the most conservative form of the identity. Two refinements strengthen the plateau case empirically.
The launch-price correction. Flagship retail prices have risen at /yr (H100 SXM5 ~$30–35K in 2023, B200 SXM ~$40K in 2025). Including this in the derivation: a plateau chip launched years ago had launch price , and its used price today is . The full identity:
with crossover condition:
The threshold drops by exactly the launch-price drift. Updated table:
| Binding capability dimension | Threshold () | Observed | Identity fires? | |
|---|---|---|---|---|
| FP16/BF16 throughput per chip | 1.31 | 0.20 | 0.25 | Yes |
| Memory bandwidth per chip | 1.34 | 0.22 | 0.25 | Yes |
| Memory capacity per chip | 1.34 | 0.22 | 0.25 | Yes |
| FLOPS per Watt per chip | 1.47 | 0.29 | 0.25 | No (marginal) |
| FP8 throughput per chip | 1.65 | 0.36 | 0.25 | No |
The identity fires cleanly on the three dimensions that govern most production inference, and fails on the two dimensions Blackwell optimised hardest for. The split matches the migration thesis exactly: plateau wins on the dimensions that govern plateau workloads, and frontier wins where it should.
Numerical validation. For mid-2026 used H100 ($15K, 3.35 TB/s, age 3 yr) versus new B200 ($40K, 8.0 TB/s, retail) on memory bandwidth, the full identity predicts:
Predicted plateau advantage: ~12%. Empirical observation (Table H.2): used H100 at 0.223 GB/s/$ vs new B200 at 0.200 GB/s/$ = 11.5% advantage. Model and empirical agree to within one percentage point. The identity is a quantitative one that matches the market.
The second refinement — capability thresholds rather than capability appetites — formalises the lift to the workload-economic plane.
H.1 — Log capability scaling. Inference benchmark scores scale approximately as . A 4× FLOPS deficit corresponds to a benchmark gap of roughly 3–5 percentage points on MMLU, GPQA, or HumanEval at current scale. Most economically useful workloads have capability thresholds, not capability appetites: a customer-support router that needs 85% intent-classification accuracy does not benefit from a model scoring 92%. The operative question is not whether plateau capability matches frontier headroom; it is whether plateau capability exceeds the workload’s threshold at lower cost. The workload surface where plateau is sufficient grows monotonically with .
H.2 — Memory-bound, not compute-bound. Autoregressive LLM inference is memory-bandwidth-bound, not FLOPS-bound. The relevant cap-per-dollar metric is bandwidth per dollar:
| Hardware | State | Price | HBM bandwidth | GB/s per $ |
|---|---|---|---|---|
| H100 SXM5 | Retail (2023) | $35,000 | 3.35 TB/s | 0.096 |
| H100 SXM5 | Used (mid-2026) | $15,000 | 3.35 TB/s | 0.223 |
| B200 SXM | Retail (mid-2026) | $40,000 | 8.0 TB/s | 0.200 |
Used H100 delivers ~12% more memory bandwidth per dollar than a new B200 at retail. The B200’s structural advantages — 192 GB VRAM, 4× FP8 throughput, NVLink 5 at 1.8 TB/s — bind on training workloads with synchronous gradient sync and on models exceeding 80 GB. Most production inference exhibits neither. Plateau hardware does not beat frontier globally. It beats frontier on the dimensions that govern specific deployment workloads.
H.3 — Architectural Divergence Penalty. The cancellation of in the headline identity assumes today’s software gains backport cleanly to older silicon. In practice, some optimisations are tied to physical blocks on the newest chips: FP4 quantisation runs natively on Blackwell’s 5th-gen Tensor Cores and emulates with severe penalty on Hopper; the latest speculative-decoding heads assume the memory hierarchies of newer interconnects. Define as the fraction of contemporary software efficiency that cannot be ported to plateau hardware. The corrected identity:
partially offsets the depreciation advantage. Empirically, is small for workloads dominated by FP16/BF16/FP8 transformer inference (most production deployment today) and substantial for workloads that lean on the newest precision formats. The migration thesis is robust to modest ; it fails only if grows faster than , which would require a sustained pattern of new software locking out previous generations on most economically useful workloads. The reverse is empirically observed: distillation, quantisation, and inference kernels keep finding ways to fit modern models onto older silicon.
H.4 — VRAM step-function. A 70B-parameter model at FP8 occupies ~70 GB before KV cache. On H100 (80 GB), it fits with little headroom and typically requires two-GPU tensor-parallel configurations, introducing all-reduce communication overhead of 9–23% of end-to-end decoding latency. On H200 (141 GB) or B200 (192 GB), the same model fits single-GPU with concurrent KV caches and no TP penalty. The CapEx identity assumes capability decays smoothly; in reality, model-fit thresholds create non-linear cliffs. Plateau hardware wins inside the cliff and loses outside it. The migration thesis holds because most production workloads fit inside the cliff for previous-generation hardware, even as frontier training requires the next cliff.
H.5 — The Economic Crossover (OpEx-extended). The CapEx identity is a numerator-only model. The full economic comparison requires operating costs. Define amortised capital cost per useful operation as and operating cost per useful operation as , where indexes the slot and the workload. Economic capability per dollar:
where is the workload-compatibility factor (capability threshold met, VRAM fits, software stack supported). The CapEx Crossover Identity governs the numerator-side condition; the OpEx term determines where the comparison breaks operationally.
For slots that rely on behind-the-meter generation (the grid-avoidance pattern of Appendix F), the electricity cost carries a thermodynamic multiplier relative to grid-supplied combined-cycle power. The economic obsolescence boundary is reached when marginal revenue per token falls below the energy cost per token:
with in $/token, the dimensionless thermodynamic multiplier from Appendix F (~1.0 for utility-supplied combined-cycle power, ~1.45 for behind-the-meter simple-cycle generation), the facility power-usage-effectiveness multiplier (typically 1.1–1.4 for modern data centres), the thermal design power in kilowatts, the unit cost of utility power in $/kWh, and the active token throughput in tokens per second. The factor of 3600 converts hours to seconds.
Sanity check. A 1 kW chip at , , $0.10/kWh power, and 100 TPS yields:
— or roughly $0.33 per million tokens, within the empirical range for production inference.
Behind-the-meter slots compress the operating margin by the factor . This is the mathematical link between Appendix F (the grid-avoidance tax) and Appendix H (the plateau migration): older hardware migrates to slots where is small — grid-supplied utility power, university clusters, enterprise data centres on bulk power contracts, sovereign facilities, and the colocation tier — and not to slots where is large (the behind-the-substation reciprocating-engine deployments most useful for frontier training when the grid will not connect in time).
H.6 — The workflow-discovery recursion. The CapEx and OpEx framework above treats as exogenous. The fourth structural rate is that is itself accelerated by AI-assisted engineering labour. The load-bearing claim is the weak one: . AI-assisted engineering improves the rate at which the software layer adapts to whatever compute is cheapest and available. The CapEx Crossover Identity holds for any trajectory satisfying this condition.
A stronger illustrative result follows if engineering throughput scales with deployed plateau intelligence, . Define the rate of software-efficiency improvement as:
Substituting the recursion and solving the resulting linear ODE in :
A double exponential, bounded above by deployment friction (verification, testing, integration, organisational change-management). The functional form is an intuition pump for why plateau-fill may run faster than prior infrastructure cycles, not a forecast. The migration thesis fires on alone.
H.7 — Frontier slowdown. The values above assume the frontier continues to compound at recent rates. Three constraints suggest the rate may not hold indefinitely.
Moore’s law has stalled at the device level — the frontier still compounds, but through capital intensity, packaging innovation, and system-scale engineering rather than transistor scaling. Hyperscaler AI spending exceeds $700 billion in 2026;[3] frontier training compute has grown ~5×/year since 2020, but hardware FLOP/$ has improved only ~1.37×/year. Capability appears to scale approximately logarithmically with compute on standard benchmarks, so each additional order of magnitude buys a roughly fixed absolute increment, not a proportional one.
The identity is monotonic in slowdown. A frontier compounding at per architectural release does not fire the CapEx identity. A frontier compounding at annualised fires it strongly. If scaling continues, plateau hardware runs the compressed descendants of fresh capability. If scaling slows, plateau hardware runs durable capability for longer. The thesis is robust in both directions.
H.8 — Boom-driven input-cost inflation. The binding inputs to new-generation silicon — HBM, advanced packaging, substrate capacity, networking, power delivery — are in tight supply against the same demand curve driving the order book. The /yr launch-price drift is not a noise term. It is the cost-side channel through which the boom widens the plateau wedge.
Epoch AI’s teardown places B200 module production cost at roughly $5,700–7,300, with HBM and advanced packaging accounting for ~two-thirds of variable unit cost.[26] SemiAnalysis estimates memory could rise to ~30% of hyperscaler AI data-centre capex in 2026, up from ~8% in 2023–2024, with HBM undersupplied through 2027.[27] If boom-driven inflation pushes from 1.05 to 1.08 over the next two generations, the threshold on FLOPS/W (currently failing at 0.29 against observed ) drops to ~0.26, on the cusp of firing. The wedge does not need the frontier to slow — only to keep getting more expensive.
The cycle modulates; the cascade persists. The cost-side channel is cyclical, but the post-bust generation arrives later than naive cyclical reasoning suggests — the production pipeline is committed years in advance. Hyperscaler order books already span Rubin and Rubin Ultra into 2027–2028 and Feynman into 2028–2029. What the bust changes is the clearing price of that pipeline as it lands, not the schedule. Committed chips arrive on the published cadence and their secondary-market pricing collapses to plateau levels on a shorter clock than in prior cycles. The slot mechanism of §V handles pre-ordered and new-design generations identically. The plateau migration is a structural feature, accelerated by busts but not contingent on them.
Three depreciation regimes. The identity rests on distinguishing three depreciation rates that, in AI hardware, move at different speeds:
| Regime | Driven by | Timescale | What it means |
|---|---|---|---|
| Frontier obsolescence | , | ~2 years | Chip no longer optimal for the next-largest frontier training run |
| Capability obsolescence | Software stack abandonment, physical failure | 5–10 years | Chip cannot run any useful workload |
| Economic obsolescence | OpEx vs revenue (slot , workload , thermodynamic multiplier ) | Set by slot, workload, and electricity price | Output worth less than operating cost |
For most asset classes the three regimes move in lockstep. A car ages out of newness, off the dealership floor, and out of usefulness on similar schedules. AI hardware does not. The frontier moves on two years. Capability moves on five to ten. Economic obsolescence is set by the slot. The CapEx Crossover Identity gives the magnitude of the gap between frontier obsolescence and capability obsolescence — the years during which a chip has fallen off the frontier but is still capable, still supported, and still economic on slots its first owner did not want.
Three empirical tests. The framework admits three cleanly falsifiable observables on different timetables.
The earliest and sharpest one: by end of 2027, secondary-market clearing prices for Blackwell B200 SXM should sit in the $22–28K range — a of approximately 0.25–0.35 over two to three years from launch, reproducing the Hopper depreciation arc one generation later. If Blackwell holds near launch (above $35K) into 2028, the eviction mechanism that powers the migration is not operating on the current generation. The framework is wrong in the specific way it claims to be falsifiable: the inequality fails empirically, not theoretically.
The second runs on the workload surface: by the end of 2028, the largest share of economically deployed AI — measured by tokens served, queries answered, decisions made, dollars saved — should be running on Hopper-class and early-Blackwell hardware rather than on whatever the contemporary frontier silicon is at that time. If frontier hardware still serves the majority of production workloads in late 2028, the framework is wrong in a way no amount of careful unit analysis can rescue.
The third runs on the pricing surface itself: by the end of 2028, the spread between hyperscaler on-demand H100/H200 rentals and specialist-cloud or marketplace rentals for the same hardware should remain wide or widen — not converge. The spread today reaches up to 10× (Appendix J). If hyperscaler pricing converges down toward specialist pricing — closing the spread below approximately 2× — the bifurcation between premium access layer and economic substrate has not occurred, and the workload-share / revenue-share decoupling (§V, Appendix J) is falsified. The spread is checkable monthly from public price pages.
The deeper point. As long as on the dimension a workload binds on — after correcting for architectural divergence and matching the workload to slots where the thermodynamic multiplier is bearable — that workload migrates to plateau hardware once the chip is repriced. The first capital cycle pays the frontier price. The second buyer inherits the asset. The algorithms make the asset compound.
I. The Cascade in Progress
The framework predicts a cascade. A natural question: is the cascade a forecast, or is it already running? It is already running, observably, across every prior data-center accelerator generation. The bust window does not invent the dynamic. It scales an existing one to public visibility.
Generation-by-generation status, mid-2026.
| Generation | Launch | Launch price (top SKU) | Secondary clearing (mid-2026) | Current production role |
|---|---|---|---|---|
| Turing T4 | 2018 | ~$2,500–10,000 depending on SKU | ~$1,500–2,500 | Default inference on AWS g4dn, GCP equivalent SKUs; embeddings, ranking, smaller-model serving across enterprise |
| Ampere A100 | 2020 | ~$15,000–20,000 (40 / 80 GB SXM4) | ~$5,000–8,000 | Primary production workload on AWS p4d, Azure ND A100, GCP a2; substantial share of enterprise AI inference globally |
| Hopper H100 | 2022/2023 | ~$30,000–35,000 (SXM5) | ~$15,000 (per Appendix H) | Current workhorse; frontier-adjacent; per Table H.2 already trades at higher memory-bandwidth-per-dollar than new B200 retail |
| Hopper H200 | 2024 | ~$30,000–40,000 (limited disclosed) | Limited secondary supply | Current production tier; increasingly inference-priced as Blackwell capacity ramps |
| Blackwell B200 | 2024/2025 | ~$40,000 | Negligible secondary supply | Current frontier tier; first observable depreciation cycle still 12–24 months ahead |
Each generation in the table is simultaneously serving a different tier of the workload stack today. The slot hierarchy is already in motion.
Cloud SKU repricing as the dominant migration channel. Public AWS pricing data shows g4dn (T4) on-demand prices have fallen roughly 60–65% relative to 2019 launch; p4d (A100) by roughly 40–55% relative to 2021 launch. The chips never left Amazon. The migration ran entirely inside the hyperscaler’s balance sheet, visible only as falling unit prices on customer-facing SKUs — the quiet form of the cascade the body describes. The cascade is visible in a second pricing surface today: hyperscaler on-demand H100 SKUs list at approximately $6.88–$12.29 per hour, while specialist clouds and marketplace/spot networks deliver the same H100 capability at $1.25–$4.29 per hour — a spread of up to 10× on the same chip in the same year (full breakdown in Appendix J). The hyperscaler price is the slot rent; the marketplace price is the underlying chip economics, repriced for a buyer who can tolerate intermittency.
Hyperscaler depreciation extensions. Microsoft (2022), Alphabet (2023), Meta (2023), Oracle (2024), and Amazon (2024) have each extended useful-life assumptions for server infrastructure from approximately four years to six years. These are CFO-signed statements with Sarbanes-Oxley liability, audited by the Big 4, and defended against internal utilisation data no outside analyst sees. Each extension is an audit-level acknowledgment that the chips remain operationally useful far longer than the original capex cycle anticipated — precisely the gap between frontier obsolescence (~2 years) and capability obsolescence (~5–10+ years) the three-regime table predicts. Microsoft’s FY2023 extension alone added ~$3.7 billion to annual operating income; the aggregate impact across the top five hyperscalers is on the order of $15 billion per year in reduced depreciation expense. The same CFOs are simultaneously issuing record bond volumes to fund new silicon and extending useful-life on the existing fleet — together implying that the productive compute fleet is becoming both larger and longer-lived than the original capacity model assumed.
Falsification test on depreciation policy. The cleanest forward signal against the plateau thesis would be a hyperscaler shortening AI-GPU depreciation in audited filings. As of mid-2026, no major hyperscaler has done so. The keynote curve makes capability claims unaudited; the audited curve makes utility claims unmarketed. The thesis tracks the second.
Anthropic / Colossus as cascade in miniature. The May 2026 Anthropic–SpaceX/xAI agreement (§V; ref [20]) is the cascade operating across firm boundaries on premium current-generation silicon. Even energised frontier-tier hardware is allocated to whichever tenant can put it to highest-value use. The dominant version of this cascade happens silently inside hyperscaler SKU price sheets; the Anthropic–Colossus version is the audible instance.
Cost-per-token compression as observable substrate-level repricing. Cost-per-token at GPT-3.5-equivalent capability fell ~280× over 2022–2024 (Appendix E). The decline is software-dominant, but the hardware-side decay rate is independently visible in SKU prices on aging silicon, tracking the band the framework predicts.
The validation that has already resolved. Per Table H.2, used H100 at $15K delivers 0.223 GB/s per dollar of memory bandwidth; new B200 at $40K delivers 0.200. The Plateau Crossover Identity is firing right now, today, on the dominant production-inference dimension. The framework’s load-bearing inequality is not pending future confirmation. It has resolved in the framework’s favor on the workload bottleneck that matters most.
Summary. The mechanism operates across every data-centre accelerator generation simultaneously. The bust window is the point at which an already-running cascade reaches public visibility, not the moment it begins.
J. Hyperscaler Premium and the Plateau Substrate
The second axis named in §V — premium access layer versus economic substrate — is observable on the pricing surface today. The same chip clears at very different prices depending on the bundle wrapped around it.
Snapshot, May 2026. Public pricing for H100-class capacity, by access tier:
| Access tier | Example | Per-H100 GPU-hour |
|---|---|---|
| AWS on-demand | p5.48xlarge (8× H100) | ~$6.88 ($55.04/hr ÷ 8) |
| AWS reserved (Capacity Blocks) | p5.48xlarge reserved | ~$4.33 ($34.61/hr ÷ 8) |
| Azure on-demand | ND H100 v5 class | ~$12.29 |
| Specialist cloud, mid-band | CoreWeave, Lambda, Civo, Denvr | ~$2.25–$4.29 |
| Marketplace / spot network | vast.ai-class supply, GPUPerHour | ~$1.25–$2.75 |
| Owned hardware, amortised | New H100 PCIe ~$25K; SXM ~$35–40K | under $1.50 at high utilisation |
The spread between hyperscaler on-demand and marketplace supply for the same chip in the same year reaches up to 10×. The structural delta is not depreciation. It is the cost of a bundle: trusted, energised, instantly available, supported, billed, and procured through enterprise channels. That bundle is durable for one class of customer and a tax for another.
What the spread implies. Hyperscaler GPU rental pricing measures slot rent. A chip can move out of frontier status while the hyperscaler SKU built around it remains expensive. The H100 is the visible case: out of frontier as of Blackwell launch, still priced as scarce on hyperscaler clouds, simultaneously available at marketplace rates that approach amortised owned-hardware cost. AWS raising H200 Capacity Block prices by 15% in January 2026 (§I, body) is the same mechanism running on the next-generation chip: scarcity priced into the slot, not into the silicon.
The bifurcation, mapped. The premium access layer captures: largest training runs, premium managed inference, latency-sensitive global APIs, regulated enterprise workloads, customers paying for procurement simplicity, organisations without internal MLOps depth. The economic substrate captures: batch inference, embeddings, fine-tuning, document processing, simulation, scientific computing, bounded agentic systems, retrieval and ranking pipelines, narrow-vertical Stockfish workflows. Both surfaces persist. They serve different customers.
Revenue share versus workload share. The two metrics can decouple. Hyperscalers may continue to capture the majority of AI infrastructure revenue — because enterprise customers pay the bundle premium — while losing the majority of AI inference workload by token volume, query volume, or useful inference-hours. The workload migrates to wherever fully-burdened cost is lowest; the revenue follows wherever the trust premium is paid. The two surfaces measure different things and need not move together.
Multi-model routing volume as workload-share signal. OpenRouter — the multi-model routing platform serving 8M+ developers — disclosed token throughput growing from ~5 trillion to ~25 trillion tokens per week over the six months ending May 2026, a 5× expansion across a heterogeneous plateau-tier model surface (Anthropic, OpenAI, Google, Meta, DeepSeek, and open-weights).[25] Its $113M Series B was led by CapitalG with NVIDIA Ventures, Snowflake, Databricks, MongoDB, ServiceNow, a16z, and Menlo Ventures — the enterprise-data and silicon stacks underwriting the multi-model topology rather than a winner-take-all frontier outcome. The data confirms the topology the framework predicts; whether the absolute level is durable depends on how the 2026–2027 procurement reset resolves.
Falsifier. If hyperscaler on-demand H100/H200 pricing converges down toward marketplace pricing — closing the spread below approximately 2× by the end of 2028 — the premium access layer thesis is wrong. Hyperscaler clouds would in that case be the natural home of plateau intelligence, and the workload-share / revenue-share decoupling would not materialise. The spread is checkable monthly from public price pages. As of May 2026, it is widening, not closing.
Caveat on snapshot volatility. Pricing snapshots will move. AWS Capacity Blocks reprice dynamically. Specialist clouds run promotional rates. Marketplaces fluctuate with supply. The structural claim does not depend on the specific dollar figures in the table being correct on any given month. It depends on the shape of the price surface — a wide, durable spread between premium and economic-substrate tiers for the same hardware. If that shape collapses, the bifurcation has not occurred.
K. Macroeconomic Absorption and the Deployment-Phase Drag
The cascade described across §III–§V predicts that civilisational intelligence diffuses through depreciated silicon on fragmented power slots over a multi-year horizon, not a multi-quarter one. This appendix audits the contemporaneous macroeconomic data confirming that horizon — the wedge between localized firm-level efficiency gains and aggregate macro-level productivity growth — and locates the cascade inside the canonical Brynjolfsson Productivity J-Curve and Carlota Perez installation-to-deployment framework. The headline finding: the deployment window holds at a 5–10 year primary phase with continued macroeconomic absorption through 2035+, and the cascade mechanism is structurally anti-fragile to which point in the macro projection band turns out right.
K.1 The Brynjolfsson J-Curve and intangible capital accumulation
Brynjolfsson, Rock, and Syverson (2021) formalised why a massive general-purpose technology boom can co-exist with stagnant aggregate productivity.[36] Firms must accumulate substantial intangible capital — process redesign, retraining, verification scaffolding, organisational restructuring — before the GPT’s productivity dividend appears in measured output. Because the intangible accumulation is expensed as opex (not capitalised), national accounts systematically under-measure both GDP and TFP during the absorption phase. Historical adjustments for computer-era intangibles found that true TFP was 15.9% higher than official measures by the end of 2017.
The J-Curve is now quantified in current AI data. PricewaterhouseCoopers’ 2026 AI Performance Study (1,217 senior executives across 25 industries) finds 74% of AI’s measurable economic value captured by the top 20% of firms, leaving 26% to the remaining 80%.[28] The long tail of adopters has not yet completed the intangible-capital accumulation that would let them realise the gains, even where they have purchased the software. The cash budget ratio is now visible: the SXSW 2026 CMO Survey of 400 organisations finds that positive-ROI AI deployments spent approximately $2.60 on training and change management per $1 spent on the AI software itself, with organisations failing to match this ratio experiencing tool abandonment rates of 60–70% within six months.[29] The implied market-value ratio is older but consistent: Brynjolfsson, Hitt, and Yang (2002) found $1 of physical computer hardware historically associated with approximately $9 of corporate market value — the market pricing the unmeasured complementary intangible capital long before the national accounts measured it.
K.2 Enterprise deployment friction is structural, not transient
The deployment friction is not theoretical. S&P Global Market Intelligence and the RAND Corporation (2025) report that 42% of corporate AI projects were scrapped in 2025 and 80.3% of AI initiatives failed to deliver business value — twice the failure rate of traditional, non-AI IT projects.[30] Gartner projects that more than 40% of agentic AI projects will be cancelled by end-2027 due to escalating costs, lack of clear business value, and inadequate risk controls.[30] Although 97% of surveyed enterprises have experimented with AI agents in some form, only 10–12% have successfully transitioned them into production environments.
At the workflow level, the downstream verification burden compounds the friction. The METR randomised controlled trial (2025) found experienced open-source developers were 19% slower using AI tools — the cognitive cost of reviewing non-deterministic AI output exceeded the speedup from generation.[31] Google’s DORA 2024 report associated a 25% increase in AI adoption with a 7.2% decrease in production delivery stability. Google Cloud (2026) reports 45% higher burnout among frequent AI users, with approval fatigue identified as the primary mechanism. GitClear’s analysis of 153 million changed code lines projected a doubling of code churn translating directly into delivery instability. The pattern is now replicated across multiple independent sources: individual-level speedups of 20–56% on isolated tasks fail to translate into firm-level velocity because architectural integration and verification absorb the gains — the Productivity-Reliability Paradox.
K.3 The labor-market signal
The Stanford AI Index 2026 reports that early-career software-developer employment (ages 22–25) in AI-exposed roles fell approximately 20% from 2024 to early 2026, against stable aggregate white-collar wages.[32] The signal validates the plateau cascade mechanism from a different surface: as task work. The kind of work entry-level developers historically did is now automated by AI, human labor migrates upstream to verification, integration, and high-abstraction architecture. This is exactly the Stockfish-bounded engine wrapping learned judgment structure §VI predicts.
The shift is happening at the entry of the labor market first because that is where syntactic work concentrates; senior engineers retain their wage premium because their work is verification-heavy rather than generation-heavy. The labor market is therefore not just a productivity surface but also a verification-of-mechanism surface: it shows the plateau cascade transferring labor exactly as the workflow-ownership thesis predicts, on the timeline the J-Curve framework predicts, in the direction the framework predicts.
K.4 Aggregate productivity dispersion and the Perez modification
US aggregate labor productivity has grown at 1.6% annualised since Q4 2019 — a modest acceleration from the 1.2% pre-pandemic decade. Forward projections are widely dispersed. Goldman Sachs Global Macro Research projects AI-driven labor productivity acceleration to 1.7–1.9% through 2029, peaking at 1.9–2.3% in the early 2030s, with potential GDP growth elevated to 2.1–2.3% for the rest of the decade.[33] Penn Wharton Budget Model projects permanent annual potential-output gains of less than 0.04 percentage points, with aggregate GDP rising only 1.5% by 2035.[34] Acemoglu (2024) similarly estimates aggregate GDP gains of 1.1–1.6% over a decade and a marginal annual TFP boost of approximately 0.05%.[35] The dispersion is real and reflects genuine uncertainty about how fast the J-Curve resolves.
The Carlota Perez framework — installation phase → financial crash → deployment phase — historically required a structural separation between speculative financial capital and operational production capital. Telecom equity holders went bankrupt while consolidator-platforms (Level 3, Equinix, Digital Realty predecessors) acquired the physical infrastructure at salvage and re-leased into the deployment-phase Golden Age. The AI cycle’s most credible modification of that pattern is the convergence of these two capital types inside hyperscaler balance sheets: trillion-dollar platforms funding their own builds from organic cash flows plus low-cost corporate debt ($121B of hyperscaler bonds issued in 2025 per the Moody’s / Bank of America aggregation referenced in §III; see Note [4]), and absorbing depreciating silicon internally through frontier-to-inference cascades rather than clearing it through the secondary market. The deployment phase still arrives; the bankruptcy-driven clearing event runs through the mid-market specialist-cloud and integrated-developer layer rather than through the trillion-dollar platforms. The access layer — energised slot, plateau silicon, master lease — is captured at the slot level rather than at the equipment level.
K.5 Implications for the cascade
Three structural facts follow. First, the deployment window is calibrated as a 5–10 year primary phase with continued macroeconomic absorption through 2035+ — the J-Curve’s intangible accumulation takes years, not quarters, and the failure-rate evidence is overwhelming that this is structural rather than transient. Second, the cascade mechanism described across §III–§V is robust to which point in the macro projection band turns out right: in the Goldman optimistic case, workloads scale fast and plateau-tier capacity re-leases into a rising-demand environment as the deployment phase compresses; in the Penn Wharton / Acemoglu conservative case, plateau silicon remains uncontested by frontier displacement for materially longer because aggregate demand never reaches the threshold that would justify the next-generation chip displacing the current one across the plateau workload base. Different mechanisms, both favourable to the cascade. Third, the access layer is the structurally underwritable surface across the dispersion band: the macro outcome is uncertain; the access constraint is not. The cascade resolves either way; the access surface gates how the value distributes. The plateau is where civilisational intelligence will be deployed because that is where the absorption-phase economy can afford to deploy it.
K.6 A note on method
The thesis stated across §I–§VIII was specified in advance of the empirical surfaces verifying it. The slot hierarchy (§II, §V), the depreciation cascade (§III) and its cost-side widening (H.8), the workflow-ownership endpoint (§VI), the dual decoupling of revenue share from workload share (J), the Plateau Crossover identity (H), the J-Curve absorption phase (K.1), the convergence-of-capital modification to the Perez framework (K.4) — each was named as a structural prediction before the corresponding data became available. Twelve independent empirical surfaces have validated specific predictions of the framework across 2024–2026; none has falsified it. Each confirmation is a risky prediction in the Popperian sense: the surface could have come back differently and did not.
The forward predictions remain observable on known timetables. The operator-distress trough crystallising across 2027–2028 via the three-of-six trigger gate. Secondary-market clearing prices on the pre-order pipeline as it lands — Rubin, Rubin Ultra, Feynman across 2027–2029 (H.8). The labor-market shift to verification work compounding from the 20% early-career signal of 2024–2026 (K.3). The macroeconomic productivity acceleration resolving toward one of the Goldman / Penn Wharton / Acemoglu trajectories (K.4). The convergence-of-capital effect on hyperscaler asset cascading materialising or not, observable in whether mid-market specialist-cloud and integrated-developer estates clear through Section 363 in volume at the trough. Each is checkable from public data on its own clock.
The discipline that produces the framework’s predictive validity is that it remains publicly falsifiable. If any forward prediction falsifies in real time, the framework updates accordingly. The essay is a mechanistic theory of compute access in the deployment-phase economy, exposed to the world’s verdict on a forward verification track. The cascade described across this essay is what is predicted; whether it resolves on the timetable, in the magnitudes, and through the channels specified is what the next five years will demonstrate.
L. Domain-Specific Diffusion and the Stockfish Multiplier
§VI argues that the deployable shape of intelligence is bounded learned components inside engineered scaffolding — Stockfish for everything. This appendix quantifies the implication: the deployment timeline under that architecture is wave-shaped across 2026–2040, governed by a workload-specific reliability-bar lattice and a scaffolding multiplier that stacks on top of base-model capability. The shorthand “AI will reach 99.99% reliability in N years” is correct on the headline number and misleading on the economic substance, because most addressable value crosses its threshold long before the terminal bar is reached.
L.1 — The reliability-bar lattice. Autonomous deployment of an AI workflow requires effective system reliability to exceed a workload-specific bar , set by cost-of-error, recourse availability, and regulatory regime. The bar spans four orders of magnitude across the economy. The values below are approximate, intended as order-of-magnitude thresholds rather than measured universal constants — the load-bearing claim is the spread across workload classes, not the exact decimal assigned to any single task:
| Task class | Rationale | |
|---|---|---|
| Brainstorming, drafting, outlines | ~0.70 | Human review default |
| Translation, summarisation | ~0.92 | Direct user consumption |
| Code completion (suggested) | ~0.80 | Developer accepts/rejects |
| Customer support (assisted deflection) | ~0.85 | Human escalation handles tail |
| Code generation with tests-in-loop | ~0.90 | Test suite catches failures |
| Junior knowledge work (supervised) | ~0.85 | Supervisor catches failures |
| Research synthesis with citations | ~0.92 | Citations enable verification |
| Tier-2 autonomous customer resolution | ~0.92 | Human escalation tail |
| Sales prospecting and outreach | ~0.85 | Low cost of error |
| Tax preparation (autonomous, standard cases) | ~0.98 | Liability-bearing |
| Legal contract drafting (standard) | ~0.98 | Liability-bearing |
| Code review with autonomous merge | ~0.98 | Production risk |
| Medical diagnosis (autonomous, routine) | ~0.99 | Patient safety |
| Surgical decision support (high autonomy) | ~0.999 | Life-critical |
| Financial trading (autonomous) | ~0.9999 | Cascading capital risk |
| Self-driving (L4+, all edge cases) | ~0.99999 | Public safety, regulatory |
| Autonomous surgery | ~0.99999 | Patient mortality |
The shape of the diffusion follows from the shape of the bar lattice, not from any single benchmark trajectory.
L.2 — Base capability growth. Frontier-model headline benchmark improvement, annualised across 2024–2026 releases (Anthropic, OpenAI, Google):
| Benchmark family | Annualised improvement | Notes |
|---|---|---|
| Hard reasoning (HLE, GPQA) | ~10–15 pp/yr | Diminishing returns near ceiling |
| Hard coding (SWE-Bench Pro) | ~15–20 pp/yr | Largest current gains |
| Saturating agentic tasks (OSWorld) | ~3–6 pp/yr | Approaching benchmark ceiling |
| Practical knowledge work (Finance Agent, GPQA-AA) | ~8–12 pp/yr | Mid-saturation |
Capability scales approximately logarithmically in compute (§H.1); benchmark improvement saturates as scores approach 1.0. The naïve linear extrapolation that produces a 2038–2042 estimate for 99.99% across hard agentic tasks treats the base-model rate as the only input. It is not.
L.3 — The Stockfish multiplier. Each scaffolding layer in the §VI architecture catches an independent fraction of base-model failures. Treat each layer as a Bernoulli catcher with catch probability , applied independently to residual failures. Effective system reliability for workload :
Illustrative catch probabilities from observed deployment systems:
| Scaffolding layer | (illustrative) | Mechanism |
|---|---|---|
| Test-driven generation loop | ~0.60 | Failing tests reject candidate outputs |
| Multi-agent disagreement detection | ~0.50 | Two-agent disagreement triggers tiebreaker |
| Formal verification on covered surface | ~0.95 | Type system, proof checker, schema conformance |
| Retrieval-grounded generation | ~0.70 on factual claims | Citation anchoring eliminates a hallucination class |
| Tool augmentation (calc, code exec, search) | ~1.0 on the augmented sub-task | Deterministic computation |
| Specialised fine-tuning on domain | ~0.30 lift on residual error | Domain-specific failure-mode coverage |
| Human-in-the-loop on calibrated abstention | ~0.95 on flagged cases | Uncertainty triggers escalation |
Layers stack multiplicatively on independent failure modes. An 85% base model + 60% test-loop catcher + 70% retrieval catcher yields:
Roughly 98% effective system reliability emerges from a base model that one-shots at 85%. This is the empirical pattern observed when Claude Code, Cursor, or comparable agentic harnesses wrap a frontier model in test-loop scaffolding: codebase-scale operations succeed at materially higher rates than the underlying model’s one-shot benchmark would predict.[37]
The independence assumption is an upper-bound simplification. Scaffolding layers have correlated failure modes — bad retrieval, ambiguous instructions, shared model priors, common blind spots, or domain misunderstandings can cause multiple layers to fail together. The deployment question is therefore not whether every layer is statistically independent, but whether enough layers are orthogonal — tests, retrieval, formal checks, human escalation, sandboxed execution, audit logs — to reduce residual failure below the workload bar. Three to five well-chosen orthogonal layers clear most production reliability bars; the operative work is in selecting layers whose failure modes do not stack, not in adding more layers of the same kind.
The Stockfish multiplier is not metaphor. It is the architecture of the deployed system, and it is the lever the next decade of effective intelligence runs on.
L.4 — Deployment-phase timeline under the multiplier. Combining base capability growth (10–15 pp/yr on hard tasks, saturating logarithmically) with empirically achievable Stockfish multipliers, task classes cross their respective bars in the following windows. Each window assumes the relevant scaffolding stack is built and deployed; absent the stack, the crossing slips by years.
| Window | Task classes crossing their bar |
|---|---|
| 2024–2026 (crossed) | Brainstorming and drafting, translation, summarisation, code completion, email drafting, Tier-1 customer-support deflection, research drafting with citations |
| 2026–2028 | Code generation with tests-in-loop, Tier-2 autonomous customer resolution, document review (legal, financial, audit, with human sign-off), personal-assistant task handling, sales prospecting automation, standardised tax preparation, educational tutoring |
| 2028–2031 | Autonomous coding for non-critical systems, medical diagnosis support (assisted), legal contract analysis and standard drafting, low-risk autonomous code-review merge, financial advice (assisted, with liability framework), junior engineering design, geofenced robotaxi expansion |
| 2031–2035 | Autonomous medical diagnosis (routine), autonomous standard legal drafting, autonomous coding for critical systems (with formal verification), high-autonomy surgical decision support, L4 robotaxi across most environments, autonomous portfolio management |
| 2035–2040 | Autonomous surgery (with safety backup), binding autonomous legal counsel, fully autonomous self-driving across all edge cases, model-led novel scientific discovery |
| Possibly not under current paradigm | Cross-domain executive judgment, fully autonomous novel cross-disciplinary theorising, some long-tail physical-world autonomy edge cases |
The economically dense windows are 2026–2028 and 2028–2031. The 2031–2040 windows clear the critical-domain autonomy bars but capture a smaller fraction of total economic surface than the earlier waves. Most addressable value reorganises through the access layer during 2026–2031, well before the headline 99.99% bar is cleared on any domain.
L.5 — Wave-shaped inference demand and the lease term. The diffusion is not concentrated at the terminal date. Each row of the L.4 table represents a new tenant class entering the inference compute market on plateau-tier silicon. The compounding produces monotonically rising inference demand across the 2026–2040 window:
- 2026–2028: knowledge-work and customer-facing AI wave. Largest tenant count, lowest per-unit inference, broad geographic distribution.
- 2028–2031: professional-services and supervised-autonomy wave. Mid tenant count, mid per-unit inference, regulated-vertical concentration.
- 2031–2035: critical-domain autonomy wave. Smallest tenant count, largest per-unit inference per critical decision, highest tenant credit quality.
The agentic compounding factor of §VI — dynamic workflows, hundreds of parallel subagents per user-facing action, longer-running tasks — is approximately orthogonal to the bar-crossing schedule.[37] Same model + 10–100× agentic complexity per action multiplies inference demand independent of which bar has crossed. The two effects compound: wave-shaped tenant growth × agentic-complexity multiplier on every active tenant.
For an access-layer operator, this wave shape supports multi-year underwriting. A position originated in 2027 captures the full 2028–2031 professional-services wave and the early 2031–2035 critical-domain wave inside a 5–10 year horizon. The underwriting does not depend on any single wave materialising on schedule; it depends on enough waves materialising across the horizon to keep utilisation above the underwriting floor. The wave shape converts deployment-timeline uncertainty into portfolio-style diversification across the access-layer position.
L.6 — Three falsifiers. Risky predictions on observable timetables:
The first runs on the early waves: by the end of 2027, the share of customer-support, sales-outreach, document-drafting, and code-completion work-hours served by AI scaffolding in autonomous-execution mode should exceed 25% across measured enterprise deployments. Autonomous-execution mode means the AI system executes the task without per-action human pre-approval, while preserving human escalation, sampled review, or post-hoc audit — distinguishing it from suggestion mode, where each output requires acceptance before it has effect. If the share remains below 10% — i.e. AI continues to be reviewed suggestion-only and does not transition to autonomous-execution mode at scale — the L.4 2026–2028 crossing window is empirically incorrect and the diffusion is materially slower than the framework predicts.
The second runs on the mid-wave: by the end of 2029, the autonomous-merge rate on code-review benchmarks — PR review systems acting without human approval on flagged-low-risk changes — should exceed 30% on the production codebases of mid-sized engineering organisations. If the rate remains below 5%, the L.4 2028–2031 window is empirically incorrect and the multiplier model overstates achievable scaffolding gains.
The third runs on multiplier mechanics directly: by the end of 2028, peer-reviewed work on multi-component scaffolding stacks (test loops, multi-agent verification, retrieval grounding, formal verification on covered surface) should report combined effective reliability of ≥99% on tasks where base-model one-shot reliability sits in the 80–90% range. If the published evidence converges on combined effective reliability that stays below ~95% on the same baseline, the L.3 multiplier identity is overstated and the L.4 timeline shifts later by two to four years.
The deeper point. The headline is the long march. The economic substance is the wave. The body essay’s §VI argues that the deployable shape of intelligence is bounded learned judgement inside engineered scaffolding. This appendix quantifies the consequence: the deployment timeline under that architecture is wave-shaped across 2026–2040, governed by the reliability-bar lattice and the Stockfish multiplier. Each wave passes inference demand through to the access layer continuously. The “decade-plus to 99.99% everything” framing is correct on the terminal date, irrelevant to the economic story between now and that date, and structurally favourable to a multi-year lease-underwritten access-layer position. The slower the base-model curve, the more value migrates to whoever built the scaffolding around it — and the more durable the demand for the inference compute that hosts it.
Notes & References
[1] Reuters (2026). “xAI Colossus expansion.” Colossus 1 reported holding more than 220,000 Nvidia processors and adding 300 MW of capacity; programme target of one million accelerators referenced in xAI roadmap statements.
[2] Meta Platforms (2026). Infrastructure roadmap referencing multi-million Blackwell- and Rubin-class accelerator deployment across 2025–2027.
[3] Reuters (2026). “Big tech AI and cloud spending to exceed $700 billion in 2026.” Industry capex aggregation across hyperscaler guidance.
[4] Moody’s Ratings (2026) and Bank of America Global Research (2026). Top-five U.S. hyperscalers issued a record $121 billion in corporate bonds during 2025 — more than four times their historical annual issuance average — with 2026 issuance projected at approximately $175 billion. Amazon’s $54 billion global bond sale in March 2026 was the largest debt transaction in the company’s history; in the same window Amazon’s trailing-twelve-month free cash flow fell roughly 95% to $1.2 billion. Oracle’s FY2026 capex guidance was revised upward by $15 billion to $50 billion, with approximately half intended to be funded through external equity and equity-like instruments to preserve its investment-grade rating.
[5] PitchBook (2026). “AI Venture Funding, Q1 2026.” First-quarter deployment of approximately $255 billion exceeded the full-year 2025 total. Aggregate dominated by mega-financings.
[6] S&P Global Market Intelligence (2026). “U.S. Data Center Investment, 2025 Annual.” Private equity deployed $45.7 billion, 72% of $63.35 billion total sector investment.
[7] Reuters (2026). Amazon European infrastructure briefing on grid interconnection timelines, citing seven-year horizons against approximately two-year data centre construction cycles.
[8] CME Group (Henry Hub front-month, June 2026 contract). ICE / Reuters quote pages for TTF (€50.167/MWh) and JKM ($18.80/MMBtu), converted at ECB euro reference rate 1.1628 EUR/USD (15 May 2026).
[9] U.S. Energy Information Administration (2024). “Average operating heat rate for natural gas-fired generation”: 7,754 Btu/kWh, used here as the conversion basis for fuel-only generation cost.
[10] Reuters (2026); industry pricing trackers. Amazon Web Services raised the list price of its H200 Capacity Blocks by 15% in January 2026 — one of the first material current-generation cloud-compute price increases on public record, against the long-running pattern of monotonic decline in headline cloud-compute pricing since EC2 launched in 2006. The increase applied to current-generation H200 reserved capacity, indicating that the binding constraint was energised, permitted slot availability rather than chip supply.
[11] Reuters / Crusoe Energy / Oracle (2026). “Abilene, Texas campus.” 400,000 GB200-class chips; 1.2 GW total facility design power across eight planned buildings, two operational as of mid-2026. Project capital ~$15B including a $9.6B JPMorgan-led loan. Oracle master lease to OpenAI as end tenant.
[12] Nvidia / U.S. Department of Energy (2026). “Solstice and Equinox systems at Argonne National Laboratory.” Combined 110,000 Blackwell GPUs planned; Equinox expected in first half 2026.
[13] Reuters / industry references. 1.3 kW all-in facility load per top-end accelerator used as an order-of-magnitude heuristic, consistent with Colossus 1’s reported 220,000+ processors against approximately 300 MW of associated facility power. Real deployments vary with cooling architecture, networking density, and silicon mix.
[14] Cast AI (2026). “Kubernetes GPU Utilization Report.” Average sustained GPU utilization of 5% across surveyed production clusters on AWS, Azure, and GCP. Not necessarily representative of frontier training clusters at major AI labs.
[15] NIST AI Risk Management Framework (AI 100-1), published January 2023, with the Generative AI Profile (AI 600-1) added in 2024. ISO/IEC 42001:2023, “Information technology — Artificial intelligence — Management system.” Both are foundational frameworks for AI governance; neither was designed for autonomous-agent deployment and both will require sector-specific extension to serve as production verification rails.
[16] SAE J3016 Level 4 (“high driving automation”) was publicly demonstrated by Waymo, Cruise, and others in 2015–2016. As of 2026, commercial Level 4 robotaxi services operate in a small number of geofenced cities (Phoenix, San Francisco, Los Angeles, Austin, and a handful of others). The 10+ year gap between demonstration and limited commercial deployment is the relevant analogue for the verification, regulation, and integration lag in deploying autonomous AI agents across broader sectors of the economy.
[17] U.S. Federal Reserve / Office of the Comptroller of the Currency (2011). “Supervisory Guidance on Model Risk Management,” SR 11-7 / OCC 2011-12. The foundational framework gating model deployment across U.S. banking, capital markets, and credit decisioning. Subsequent guidance (the Fed’s SR 22-6 and OCC updates) extends the same principles to machine-learning and AI models. The standard’s slow extension to generative AI is itself an instance of the verification-lag pattern this essay describes.
[18] The Information, Reuters, Morningstar, VentureBeat, Bloomberg (Q1–Q2 2026). OpenAI Q1 2026: revenue ~$5.7B against operating loss ~$6.95B (-122% operating margin), tracking toward ~$36.6B loss for the year; ChatGPT WAUs at 905M stalled below 1B target; Sora, the Disney partnership, and the Walmart pilot shuttered. Anthropic April 2026: annualised revenue run rate $30B (80× Q1 growth); Claude Code crossed $1B ARR; Q2 guidance >11B revenue at ~\600M operating profit. xAI Q1 2026: $2.47B operating loss on $818M revenue; ~$1B/month burn; Grok outside top-25 App Store downloads.
[19] Stack Overflow Annual Developer Survey (2025); JetBrains State of Developer Ecosystem (2025); GitHub Octoverse (2025). Across three independent industry surveys conducted in late 2025, the share of professional developers using AI coding tools (whether daily, weekly, or occasionally) was reported in the 70–85% range, with daily-or-weekly usage typically in the 50–65% range. Exact figures vary by survey methodology and population, but the directional trend toward saturation in the professional-developer market is consistent across all major sources.
[20] SpaceX S-1 filing (May 20, 2026), corroborated across Reuters, The Verge, WSJ, TechCrunch, Data Center Dynamics, and Anthropic’s own announcement. Anthropic took over the entire ~300 MW capacity of Colossus 1 (Memphis, TN), housing 220,000+ Hopper-class Nvidia processors, at $1.25B/month through May 2029, with reduced ramp fees (May–June 2026) and a 90-day termination right for either party. xAI shifted frontier training to Colossus 2 in Southaven, Mississippi, deploying GB200 Blackwell. Disclosed following the February 2026 xAI–SpaceX merger (combined entity valued at $1.25T); SpaceX S-1 targets raising $75B at a $2T valuation.
[21] Insurance Journal, Associated Press, IT Pro, and Reuters (2026). The Oracle/OpenAI Stargate Abilene campus expansion from 1.2 GW to 2.0 GW (an incremental 600–800 MW of premium permitted slot) collapsed in early 2026 against financing complexity, OpenAI’s revised demand forecasts, and operational problems including a winter weather event that took the liquid-cooling loop offline for several days. Microsoft took over the adjacent expansion via partnership with Crusoe, adding two new “AI factory” buildings and an on-site 900 MW gas plant alongside Oracle’s existing 350 MW backup at the original Oracle/OpenAI site. Nvidia paid a $150 million deposit to Crusoe and courted Meta as an alternative tenant before Microsoft secured the buildout. Final Abilene complex planned at ten buildings, approximately 2.1 GW facility load.
[22] Multiple sources, including Stanford HAI AI Index and Epoch AI compute-cost time series. Inference cost at fixed capability level has declined approximately 10× per year over the 2022–2026 window.
[23] See “Create an Age of Wonders” for the framing of intelligence as infrastructure, alongside the broader Age of Wonders thesis that the universe is abundant and the limiter is access.
[24] Blackstone Tactical Opportunities and Halliburton (2026) press release; Reuters and switchgear-industry trade reporting. $1 billion strategic equity investment into VoltaGrid to fund growth and the acquisition of Propell Energy Technology, supporting a 7.5 GW forward order book of modular natural-gas generation systems through 2030 deployed directly alongside data centres and industrial sites. Manufacturing capacity sized for up to 300 MW per month of reciprocating engines and turbine systems from facilities in Granbury, Texas. VoltaGrid’s modular QPac platform delivers up to 20 MW per node, with combined configurations reaching ~200 MW under simplified minor-source air permits. EIA 2024 average tested heat rates (Tables 8.1 and 8.2) used for the efficiency arithmetic; ~1,027 Btu/ft³ pipeline gas energy content used for fuel-volume conversion.
[25] OpenRouter (May 2026) Series B announcement. $113 million round led by CapitalG (Alphabet’s independent growth fund), with participation from NVentures (NVIDIA’s venture capital arm), ServiceNow Ventures, MongoDB Ventures, Snowflake Ventures, Databricks Ventures, Andreessen Horowitz, and Menlo Ventures. Company-disclosed weekly aggregate token throughput growth from approximately 5 trillion to 25 trillion tokens over the six months ending May 2026, with more than 8 million developers building against the platform’s multi-model routing API. OpenRouter routes inference traffic across heterogeneous model providers (Anthropic, OpenAI, Google, Meta, DeepSeek, and the open-weights ecosystem), serving as the workload-aggregation layer above plateau-tier inference supply and providing an independent measurement surface for the workload-share migration distinct from per-provider revenue figures.
[26] Epoch AI (2026). Component-level teardown of Nvidia B200 module production cost. Estimated variable production cost of approximately $5,700–7,300 per B200 module, with HBM3e stacks and advanced packaging (CoWoS-L and substrate) together accounting for roughly two-thirds of variable unit cost. Silicon die cost is a minority share of total module cost; the dominant cost drivers are memory and packaging, not transistor density. The teardown is the empirical basis for treating launch-price drift as a structural variable driven by memory and packaging supply rather than as exogenous noise.
[27] SemiAnalysis (2025–2026). Memory share of hyperscaler AI data-centre capex, projected ~30% in 2026 versus ~8% in 2023–2024. HBM allocation surveys forecast structural undersupply through 2027 against Samsung, SK Hynix, and Micron capacity expansion plans, with the three producers diverting general-purpose DRAM capacity toward HBM and high-margin enterprise DRAM through the same window. The capacity reallocation contributes to broader DRAM tightness and consumer-GPU price inflation as a second-order effect on the wider memory and accelerator stack.
[28] PricewaterhouseCoopers (2026). AI Performance Study 2026. Survey of 1,217 senior executives across 25 industries and multiple geographies. Found 74% of AI’s measurable economic value (revenue and efficiency gains adjusted against industry medians) captured by 20% of organisations, leaving 26% shared among the remaining 80%. Leaders distinguished by structural business-model reinvention (2.6× more likely to use AI to identify cross-industry growth opportunities; 2.8× more likely to increase autonomous decision-making) rather than localised cost-reduction. The concentration mirrors prior GPT cycles: the firms most capable of completing the intangible-capital accumulation capture the productivity dividend earliest.
[29] SXSW 2026 CMO Survey of 400 organisations. Companies achieving positive ROI on AI deployments spent an average of $2.60 on training, change management, workflow redesign, and cultural-change programmes per $1.00 spent on AI software tools. Organisations failing to match this ratio experienced tool-abandonment rates of 60–70% within six months. Brynjolfsson, Hitt, and Yang (2002), “Intangible Assets: Computers and Organizational Capital,” Brookings Papers on Economic Activity, established the historical market-value benchmark: $1.00 of computer hardware was associated with approximately $10.00 of corporate market value, implying the market priced roughly $9.00 of unmeasured complementary intangible capital per $1.00 of physical compute. The current AI cash ratio is more conservative than the historical market-value ratio but identical in direction: intangible accumulation is the binding constraint, not hardware.
[30] S&P Global Market Intelligence (2025) and RAND Corporation (2025) cross-published enterprise AI implementation analyses. 42% of corporate AI projects scrapped during 2025; 80.3% of AI initiatives failed to deliver business value — twice the failure rate of traditional IT projects. Of surveyed enterprises with agentic AI in production trials (97% of large enterprises), only 10–12% had transitioned agents into production environments. Gartner (April 2026) projects more than 40% of agentic AI projects will be cancelled by end-2027 due to escalating costs, lack of clear business value, and inadequate risk controls. Talyx (2026) replicates the 80%+ enterprise implementation failure rate across an independent survey base.
[31] METR (2025) randomised controlled trial of experienced open-source software developers. Subjects randomly assigned to AI-assisted or unassisted conditions on identical coding tasks. Experienced developers using AI tools were 19% slower in task completion than the unassisted baseline. The slowdown is attributed to the cognitive cost of reviewing non-deterministic AI output exceeding the speedup from generation — the Productivity-Reliability Paradox. Google DORA Report 2024 (DevOps Research and Assessment) associated a 25% increase in AI tool adoption with a 7.2% decrease in production delivery stability across surveyed engineering organisations. Google Cloud (2026), When AI Writes the Code, Who Reviews It?, reports 45% higher burnout rates among frequent AI users than among non-users, with “approval fatigue” identified as the primary causal mechanism. GitClear (2025) analysis of 153 million changed code lines projected a doubling of code churn directly translating into delivery instability. The paradox is replicated across multiple independent sources and is now the consensus reading of the workflow-level effect of current-generation AI tooling.
[32] Stanford Human-Centered AI Institute (HAI), AI Index Report 2026: Economy Chapter. Employment for early-career software developers (ages 22–25) in AI-exposed roles fell approximately 20% from 2024 to early 2026, against stable aggregate white-collar wage growth. Senior engineers retained their wage premium over the same window. The pattern is consistent with the plateau-cascade transferring syntactic-task labor (concentrated at entry-level) to verification-task labor (concentrated at senior level), as predicted by the workflow-ownership thesis in §VI.
[33] Joseph Briggs, Goldman Sachs Global Macro Research (2026), The Projected Impact of Generative AI on U.S. Productivity Growth. Optimistic baseline projects generative AI raising US labor productivity by approximately 15% upon full adoption; AI-driven productivity growth accelerating to 1.7% annually through 2029, peaking at 1.9% in the early 2030s and potentially reaching 2.3% under robust-adoption assumptions; potential US GDP growth elevated to 2.1–2.3% for the rest of the decade, adding approximately $7 trillion to global GDP over a ten-year horizon (a 7% GDP boost).
[34] Penn Wharton Budget Model (September 2025), The Projected Impact of Generative AI on Future Productivity Growth. Highly conservative model: generative AI raises aggregate US GDP and productivity by approximately 1.5% by 2035, nearly 3% by 2055, and 3.7% by 2075. Because the sectors most exposed to AI already exhibit fast trend productivity growth, the permanent boost to annual potential GDP growth is estimated at less than 0.04 percentage points.
[35] Daron Acemoglu (2024), The Simple Macroeconomics of AI, NBER Working Paper. Aggregate US GDP gains of 1.1–1.6% over a ten-year horizon; marginal annual TFP boost of approximately 0.05 percentage points. Consistent with the Penn Wharton estimate; substantially more conservative than the Goldman Sachs Global Macro projection. The dispersion across these three credible sources reflects genuine uncertainty about how rapidly the J-Curve resolves and is treated explicitly in K.4–K.5.
[36] Brynjolfsson, E., Rock, D., and Syverson, C. (2021), “The Productivity J-Curve: How Intangibles Complement General Purpose Technologies,” American Economic Journal: Macroeconomics (NBER Working Paper 25148, 2018). Formalises the structural lag between GPT arrival and aggregate productivity gains via accumulation of unmeasured complementary intangible capital. Companion: Atlanta / Richmond Fed / Duke CFO Survey (March 2026) finds 80%+ of large enterprises investing in AI but realised revenue per employee projected to rise only 1.5% in 2026 — the empirical signature of the J-Curve in current data.
[37] Anthropic (28 May 2026), Introducing Claude Opus 4.8, product announcement and System Card (May 2026). Dynamic-workflows feature for Claude Code runs hundreds of parallel subagents per session; Opus 4.8 reported as “around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.” Benchmark comparisons (Opus 4.7→4.8): SWE-Bench Pro 64.3%→69.2%; Humanity’s Last Exam with tools 54.7%→57.9%; OSWorld-Verified 82.8%→83.4%; Finance Agent v2 51.5%→53.9%. Cited in L.2, L.3, and L.5.
[38] David, P.A. (1990). “The Dynamo and the Computer: An Historical Perspective on the Modern Productivity Paradox.” American Economic Review 80(2). The canonical account of the multi-decade lag between electrification (US power generation built out 1880s–1900s) and the productivity payoff (which required factories to abandon central steam-drive architecture and re-engineer around per-machine electric motors, a transition that took roughly forty years to complete). The deployment-phase analogue for the cycle described in this essay.
Background references
DeepMind, OpenAI, Anthropic technical reports on model distillation, quantisation, and inference efficiency improvements over the 2022–2026 window.
BlackRock (2026). EMEA Client Survey on AI investment opportunity. More than 50% of 732 surveyed firms identified energy providers as the leading AI investment opportunity, with 37% selecting infrastructure.
Reuters (2026). U.S. Department of Energy UPRISE programme targeting 5 GW of nuclear capacity by 2029 through reactor uprates, restarts, and lifespan extensions, supported by the Office of Energy Dominance Financing’s $289 billion of available loan authority. The long-term grid-side resolution of the slot bottleneck described in this essay; treated in detail in a forthcoming companion piece.
Brynjolfsson, E. & McAfee, A. (2014). The Second Machine Age. W.W. Norton.
Smil, V. (2017). Energy and Civilization: A History. MIT Press.
Carlota Perez (2002). Technological Revolutions and Financial Capital. Edward Elgar. The canonical account of installation phases, gilded-age over-investment, financial crisis, and deployment phases that follow.
NIST AI Risk Management Framework. https://www.nist.gov/itl/ai-risk-management-framework
International Energy Agency (2024). “Electricity 2024.” On data centre electricity demand projections through 2026.