Age of Wonders

Leviathan

Where frontier intelligence goes when the land runs out.

June 17, 2026

Stand at the edge of a gigawatt AI campus and you cannot see the machine. You see buildings, substations, cooling plant, generator yards, fibre and transmission spread across square kilometres of ground. Somewhere inside, a single model is learning. But the thing teaching it has been stretched so far across the land that it has stopped resembling a computer and begun to resemble a city.

It’s a machine that outgrew its environment.

The first phase of the AI buildout was limited by the chip. The second was limited by the power to run it. The third will be limited by something the first two never tested: whether a machine large enough to train the frontier can be held together at all. A single training run is approaching a scale that no convenient patch of land can power, cool, or contain.[1] If the climb continues, the frontier will need somewhere else to live.

Leviathan is an argument about where.

I. The Limit Moves

Every phase of the buildout has been governed by a single scarce input. The scarce input keeps changing. First there were not enough accelerators, and the ones that existed were not fast enough. Foundries scaled, architectures turned over every two years, and the constraint eased. Then the bottleneck moved to power, to the transformers, substations, gas contracts, and interconnection queues that take years to assemble behind a rack. The frontier, until now, has been spared the worst of it. The largest training runs still draw only a few hundred megawatts: a large substation rather than a national problem. They have been able to buy their way onto ordinary ground.

That exemption is ending. The exponential says when.

The power a training run needs is its compute divided by the efficiency of the hardware that runs it. Frontier compute has grown four to five times a year for a decade.[1][2] Efficiency, measured in operations per watt, improves about half as fast. The gap between the two rates is paid in power, and it roughly triples each year. The efficiency gains that once absorbed part of the climb are running out. The physics that produced them, shrinking transistors and falling voltages, has largely stopped.[3][4][5]

A frontier run drew tens of megawatts in 2023 and hundreds in 2025.
Compute climbs about fourfold a year with efficiency only half as fast.

Follow that curve and a single run outgrows the grid that would power it.[1] Thirty gigawatts is more than two per cent of everything the United States can generate, drawn into one machine behind one fence.[6] No ordinary site carries that load. The transmission is not built, the permitting runs to a decade, and the cooling water is not there.[7] The migration that pushed the rest of the industry from the model to the chip to the plug is now reaching the frontier. The frontier has nowhere on land to absorb it.

The frontier needs more power, more cooling, and more space than land can give.

So it has to move. It will go where power is abundant and heat has somewhere to go: deserts under strong sun, coastlines beside deep water, the few places on Earth that can host a machine this size. Coherence follows it there: a frontier run is synchronous. After every step the accelerators stop, exchange what they have learned, agree on a single update, and continue. The run advances at the speed of its slowest link. The larger the machine, the further apart its parts, and the more of each step is lost to waiting. Power forces the frontier off easy ground. Whoever can rebuild it densely, wherever it lands, wins back the time that distance costs.

II. The Unit

A machine this large needs a part small enough to build, finance, and replace. For Leviathan, that part is ten megawatts of sealed compute. Ten megawatts is about seventy of today’s frontier racks, roughly five thousand of the most capable processors ever made, with their memory, their networking, and the dielectric fluid that carries off their heat, closed inside a subsea capsule and lowered to the seabed.[8][9] In compute terms, one pod is a modern AI supercomputer: the kind a government announces as a national asset. Leviathan lowers it into the dark.[10]

The pod takes a frontier-class training cluster and renders it as a single brick. The value is in what a brick allows. You can set another beside it, and another, without the machine spreading across the ground. A hundred pods make a gigawatt. A thousand make ten. Ten thousand make a hundred, the same sealed organ repeated until a single learning system reaches a scale no land campus could hold in one piece.

This is the move the whole concept turns on. Today a frontier cluster is among the largest coherent machines built. Leviathan makes it the unit, not the limit.

III. The Geometry

Land builds in two dimensions. The sea offers a third. A datacentre is a floor, racks in rows, the machine growing by setting buildings beside one another until it sprawls like a suburb and its far edges drift apart. Submerged, the floor disappears. Pods hang in a lattice, stacked through a volume and joined by short runs of fibre that travel up and across as well as out. The machine fills space rather than covering ground. The difference shows up directly in distance.

On a plane, the width of a machine grows with the square root of its size. In a volume, it grows with the cube root. At ten thousand modules, the lattice is roughly five times more compact than the flat campus before a single metre of cooling overhead is counted. The gap widens once the aisles, electrical rooms, and service corridors that land demands are added back.

Distance is latency. Light in fibre covers about one metre every five nanoseconds. A signal crosses a single pod in eighty nanoseconds and a two-kilometre land campus in ten thousand: a hundredfold difference paid on every synchronisation step of a run that lasts for months. A gigawatt of computing, in the subsea lattice, occupies the diameter of a large building. The ocean is what lets a machine that large stay close to itself.

IV. The Heat

Every thinking machine is also a furnace. The heat is the hard part. A processor returns almost all the power it draws as waste heat. On land, removing that heat is most of what a datacentre does: chillers, cooling towers, banks of fans, water drawn through and evaporated away. The heat sink is the true bottleneck. The largest heat sink on Earth is the ocean.

Submerged in a fluid that conducts heat but not electricity, a pod hands its warmth to the water moving past its hull. The water scarcely registers it.[9] No cooling towers, no evaporation, no consumption of fresh water in a region that has little to spare.[11][12] The ocean is a generous heat sink — but not always a cold one. Off the coast of Oman, the shelf drops to two hundred metres within a few kilometres of shore.[13] At that depth the water sits near twenty degrees, warmed by the saline outflow of the Arabian Gulf.[14][15] The thermal advantage is tangible, but it is not free: a pod cannot reject ten megawatts through a bare hull into water that warm. It needs a heat exchanger, metal folded to increase the hull’s area.[16]

And the sea has a ceiling of its own. Heat discharged into a current trails downstream as a warm plume. The plume cannot be allowed to cook what lives in it. To a first approximation, a gigawatt warms its patch of water by a tenth of a degree. Ten gigawatts by under one. A hundred gigawatts, concentrated in a single place, by three or more, past the threshold that marine ecosystems and environmental guidance will bear.[17][18] So the largest machines cannot gather at one point. They string along the coast as a chain of campuses, bound into one system by light rather than by water.

A gigawatt warms its patch of sea by a tenth of a degree. A hundred gigawatts in one place would warm it past what life can bear.
So the largest machine would string along the coast and join through fibre.

The ocean sets how large a single organ can be. The network sets how large the mind can grow.

V. The Power

The power comes from the desert. Oman sits under one of the strongest solar resources on the planet, receiving far more energy on its inland sand each year than its grid will ever draw.[19] A field of panels can send that power to the coast through buried cable, with little lost over a few kilometres.[20] The desert generates. The sea cools. The two scarce favours of the site sit a short distance apart.

Solar has one obvious defect: it stops at night. A power grid cannot, but a training run can. A run can checkpoint its state, pause, and resume exactly where it left off.[21][22] Leviathan does not need to buy a night’s worth of storage for every workload. The first machines can prove themselves by following the sun. But when the station carries frontier silicon for commercial workloads, night becomes expensive.

That changes the measure of the machine. It does not sell megawatts. It sells completed attempts: the runs finished, the architectures tried, the model generations turned over in a year. The frontier does not advance through one decisive run but through thousands of inconclusive ones. The machine that finishes the most of them learns the fastest. Intermittent power is a smaller penalty when the asset is iteration rate, not uptime. Spend firmness where it counts.

VI. What It Unlocks

The honest version is the stronger claim. Leviathan does not make a single processor faster. The chip is the chip. Its cooling returns roughly a tenth of what land would spend on fans and heat.

The point is what becomes possible once the machine can do tens of times more work within the same latency profile. A hundred pods approach a hundred times the peak compute of a frontier cluster without scaling beyond its span. In the solar-following case, after sunlight hours and operating losses, that gigawatt does on the order of thirty times the daily training of the best land machine running flat out. If the newest pods are firmed for round-the-clock operation, the same physical gigawatt moves toward the full hundred-cluster-day regime. Ten gigawatts does hundreds of times more. A hundred gigawatts enters a range where precedent runs out.

One pod is comparable to a modern AI supercomputer. A gigawatt holds a hundred of them in the span of a building.
Leviathan assembles simple modules for multiples more daily training than a land machine.

That compute resolves into two different promises, and keeping them apart is what makes the claim honest. Most of it is throughput: the runs finished, the rollouts explored, the data generated, the students distilled, the architectures tried and discarded. Not one machine, but thousands of loosely-joined ones. The lattice delivers, asking nothing the sun and the sea cannot give. The frontier advances by finishing attempts. A field of pods finishes more than any campus on land.

VII. The Shape of the Work

Two forces are converging on this architecture, and only one of them is about power.

The first is the limit already described. The frontier is outgrowing the land that can power and cool it. That alone pushes the largest machines toward the desert and the sea. The second force is quieter and, over time, more decisive: the work itself is changing shape.

The early frontier was one dense run, a single model trained end to end, every processor holding a slice of the same weights and trading them on every step. It is the most tightly synchronised computation the field has produced. It is the hardest case for any distributed machine. What binds it is the width of the network, not the length of its wires.[23][24][25] A subsea lattice has least advantage here. Shortening the wires does nothing for a machine whose limit is bandwidth.

That run is becoming a smaller part of what the frontier does. The newest models are sparse: a mixture of experts in which only a fraction of the network fires for any token. The total brain compounds while the active computation stays small.[26][27] Around that core a second economy has grown. Distillation from a master model is now standard practice. Reinforcement learning runs across millions of parallel rollouts. Synthetic data is produced in bulk. Simulation and world models run as vast independent ensembles. Reasoning loops a small block through depth instead of stacking ever-larger layers.[28][29]

These workloads share a property. They are parallel where the old run was synchronous, and tolerant of delay where it was not. Some scarcely communicate across the machine at all. The rest are bound by the length of their wires, not the width of their network. That is the one constraint a dense, short-path lattice was built to relieve.

One item on that list runs against the grain of the others. It is rising fastest. As the returns from ever-larger pretraining runs flatten, the frontier is learning to scale a different axis: the depth of reasoning a model performs before it answers.[30] A run that loops through long chains of sequential steps is neither parallel nor tolerant of delay. Each step waits on the one before it. Each pays the machine’s latency in full. Stretch the chain far enough and latency, not bandwidth, becomes the wall. The model doing the reasoning is not small. It is a sparse master that spreads its experts across the machine, so every step of the chain is a wide exchange across it. Depth and width compound.

This is the axis land cannot follow. A campus kilometres across pays its latency on every link of the chain. The deeper the recursion, the more of the run is lost to waiting. The dense lattice clears that floor. The sea makes compute no cheaper than the desert beside it. What it changes is which dimension binds: from power, which any desert can supply, to serial latency at scale, which only a machine this close to itself can reach.

So the frontier is not only running out of land. It is turning into the kind of work an ocean-cooled lattice runs best. The apex training run remains the exception: bandwidth-bound, and as hard underwater as anywhere. It is a shrinking share of a budget increasingly spent on parallel, checkpointable, latency-bound computation, exactly the work a field of sealed pods on intermittent sun is suited to.

The topology the frontier is migrating toward is the one the sea already prefers. The workload is arriving, on its own, at the shape the new place rewards. If scale keeps climbing, power drives the hardware off the land. If it gives way to depth, latency does. Both roads run to the same coast.

VIII. The Forge

Computational Abundance described how intelligence spreads through the economy: how frontier hardware, written down and repriced after the buildout overbuilds, carries cheap intelligence into every corner of ordinary work.[31] That is the story of diffusion. Leviathan is the story of the source.

It is where the frontier is made, the place that trains the largest models, generates the synthetic data, and distils the lessons that smaller machines will carry outward for years. The two essays describe one cycle. The forge trains a teacher. The teacher is distilled into students. The students run cheaply on older silicon everywhere. The value they create funds the next teacher. The frontier concentrates, the plateau diffuses, and the source has to sit somewhere.

The slot hierarchy

Frontier - gigawatts of cool, permitted power in one place. Capability discovery.

Plateau - cheap, depreciated compute, widely distributed. Capability diffusion.

Edge - local hardware in every device. Capability access.

That earlier essay named the scarce thing precisely. Compute is abundant. The slot is scarce: the permitted, powered, cooled place where electricity turns into thought. Leviathan names the scarcest slot of all: gigawatts of cool power in a location a frontier machine can actually occupy. The slot binds however the work is arranged. Spread a run across a dozen sites, as the labs increasingly do, and the constraint is not escaped but multiplied: every one of those sites still needs its own gigawatts of cool power on permitted ground. One coherent machine or a federation of them, the ocean supplies what each node requires.

That cycle runs in one direction. The ocean steepens it. Computational Abundance argued that intelligence already arrives faster than the institutions meant to verify, regulate, and trust it can take it in.[31] In practice, the supply is inexhaustible. A forge in the sea makes it more so. It raises the output of frontier capability and cheap synthetic data by a hundredfold. The work of turning that capability into medicine, law, and industry still moves at the speed of human institutions.

So the architecture closes one scarcity by opening another. When the source is the ocean, intelligence stops being the thing in short supply. What stays scarce is the room to receive it: the verification, the trust, the workflows ready for what the forge makes. The bottleneck moves one last time, from producing intelligence to absorbing it.

The forge trains the teacher. The plateau runs the students. The ocean supplies the source.

IX. Where The Land Meets The Sea

Nothing in this requires new physics. The sun already floods the desert with more power than any grid will use. The ocean already waits, a heat sink the size of a planet, a few kilometres from a coast that falls away fast. The silicon already thinks. What is missing is the assembly that joins the power to the place. Assembly is an engineering problem, not discovery.

It is a hard one. An honest programme earns each stage before the next. A single pod must reject ten megawatts into warm water, survive pressure and salt and corrosion, mate its connectors beneath the sea, and be raised again intact. Ten pods must show the result repeats. A hundred prove it is an industry rather than a demonstration. No stage can assume the one above it.

Granting the engineering, the picture resolves. A pod is an AI supercomputer sealed at two hundred metres. A gigawatt block is a hundred of them in the space of a building. A coastline of blocks is a single learning system too large for any land to hold in one piece. At every scale the heat goes to the ocean and the power comes from the sun. The parts stay near enough to think as one. Land campuses scale like cities, sprawling outward until the machine dissolves into property.[32][33][34] Leviathan scales like a machine, growing by adding sealed organs to an ocean-cooled lattice that stays whole as it climbs.

The frontier is running out of land. The ocean has been there the whole time.


Technical Appendix

A first-principles engineering analysis of the system, subsystem by subsystem. The numbers are concept-study estimates, not a final design package; ranges are shown where the inputs are uncertain. Each section states its assumptions, derives a result, and where possible checks it against a known engineering system. The weakest links are flagged explicitly rather than buried.

A. The Demand — Power Per Frontier Run

The power a single training run draws is its compute divided by the efficiency of the hardware and the duration of the run:

PC(FLOP/W)×t\boxed{\,P \approx \frac{C}{(\text{FLOP/W}) \times t}\,}

The growth rate of PP is therefore the growth rate of frontier compute divided by the growth rate of efficiency. Both have decade-long trends:[1][2][3]

QuantityAnnual growthBasis
Frontier training compute~4–5×Epoch AI, decade trend
Energy efficiency (FLOP/W)~1.3–1.5×, slowingHardware specs; Dennard scaling ended
Power per run~1.5–3×Compute ÷ efficiency

Projecting a ~0.3 GW 2025 frontier cluster forward at the two ends of the compute range:

YearAggressive (compute 4×/yr)Decelerating (compute 2×/yr)
2028~7 GW~1.2 GW
2030~50 GW~2.5 GW
2035unphysical (>1 TW)~10–15 GW

The aggressive column breaks before 2035, which is the point: scaling cannot hold 4×/yr indefinitely because power runs out first. Across every non-absurd path, a single frontier run reaches gigawatts to tens of gigawatts within 5–10 years.[35]

For scale, total United States generating capacity is ~1,280 GW,[6] and data-centre load was ~35 GW in 2024, projected to ~80–130 GW by 2030.[36][7] A single frontier run of tens of gigawatts is a national-scale load behind one fence, which is why it cannot sit on ordinary land near existing transmission. The efficiency denominator is also weakening: with Dennard scaling ended and precision approaching a floor of a few bits, FLOP/W gains increasingly come from packaging rather than transistor physics, so the historical ~1.4×/yr that partly offset compute growth is fading.[3][4][5]

B. The Pod — Compute, Power Density, Volume

A current frontier rack integrates about 72 top-end accelerators with their CPUs, high-bandwidth memory, liquid cooling, and a short coherent network, at roughly 140 kW per rack:[8][37]

Nracks=10MW140kW71NGPU71×725,100N_\text{racks} = \frac{10\,\text{MW}}{140\,\text{kW}} \approx 71 \quad\Rightarrow\quad N_\text{GPU} \approx 71 \times 72 \approx 5{,}100

Allowing for power electronics, switching, CPUs, and circulation overhead, the conservative count is 5,000–6,500 accelerators per pod. The reference public cluster used for comparison throughout is a 64-rack, 4,608-GPU deployment, so one pod is approximately one such cluster.[10]

Volume and power density. Stripped of the aisles, plenums, fan walls, and human-access clearances of a land hall, the pod is a horizontal cylindrical vessel roughly 14–16 m long and 4–5 m in diameter:

V=πr2Lπ(2.25)2(15)240m3V = \pi r^2 L \approx \pi (2.25)^2 (15) \approx 240\,\text{m}^3

Power density is then 10MW/240m342kW/m310\,\text{MW} / 240\,\text{m}^3 \approx 42\,\text{kW/m}^3 at the vessel envelope, rising to 100–150 kW/m³ at the populated-board level. A conventional air-cooled hall runs 1–2 kW/m³ all-in. The pod is one to two orders of magnitude denser, which is the physical basis for the lattice compaction in §D — and is only possible because the heat leaves through fluid, not air.[38][39]

C. Coherence — The All-Reduce and What Actually Binds

This is the load-bearing question for the topology claim, so it gets the most careful treatment. Synchronous training repeats a collective operation — the all-reduce — on every step.[23][24][40] For a ring all-reduce of a gradient of size SS across NN endpoints, each with link bandwidth BB and per-hop latency α\alpha:

TAR=2(N1)αlatency term  +  2(N1)NSBbandwidth term\boxed{ \,T_\text{AR} = \underbrace{2(N-1)\,\alpha}_{\text{latency term}} \;+\; \underbrace{\frac{2(N-1)}{N}\,\cdot\,\frac{S}{B}}_{\text{bandwidth term}} \,}

The two terms behave completely differently, and only one of them carries distance.

The latency term carries distance; the bandwidth term does not. The per-hop latency decomposes as α=tswitch+tprop\alpha = t_\text{switch} + t_\text{prop}, where switch-ASIC latency tswitch0.10.3μst_\text{switch} \approx 0.1\text{–}0.3\,\mu\text{s} and propagation tpropt_\text{prop} is the distance term: light covers about one metre every 5 ns in fibre. The bandwidth term contains no length at all — it is set by the optical fabric, not by how far apart the endpoints sit.

TopologySpantpropt_\text{prop}/hopα\alpha (incl. ~0.2 µs switch)
10 MW pod12–16 m0.06–0.08 µs~0.3 µs
1 GW dense block70–90 m0.35–0.45 µs~0.5–0.6 µs
10 GW campus140–180 m0.7–0.9 µs~0.9–1.1 µs
Land multi-building campus500–2,000+ m2.5–10+ µs~2.7–10+ µs

The honest conclusion. Submerging the machine collapses the propagation component of α\alpha by 10–100× relative to a sprawling land campus. That is decisive for latency-bound collectives — fine-grained tensor-parallel exchanges, small messages, and the long tail of synchronisation barriers where TART_\text{AR} is dominated by the latency term. It does nothing for bandwidth-bound collectives, where the bulk gradient all-reduce is governed by S/BS/B and the binding constraint is fabric bisection bandwidth — a design choice independent of whether the machine is wet or dry.[25] The subsea advantage is real, but it is an advantage on the latency term specifically. A campaign that wins frontier training on this architecture wins it on thermal density and short-path latency together, not on propagation alone. The bandwidth fabric still has to be built, and it is the same problem underwater as on land.

This is also why the architecture survives a shift to loosely-coupled distributed training: that shift attacks the latency term (by tolerating it), not the power-and-cooling problem each site still faces.[21][22]

Serial depth is the opposite case, and the one that favours the lattice most. Test-time reasoning unrolls KK sequential steps, each waiting on the one before it, so the run accumulates a wall-clock latency that grows with KαK\,\alpha. At land spans α\alpha is microseconds; in the lattice it is a fraction of one. For shallow chains the gap is noise. For the deep recursion the frontier is turning toward, KK is large, and the 10–100× advantage on the propagation component of α\alpha multiplies by KK into the difference between a run that finishes in usable time and one that does not. Here short-path latency stops being an efficiency and becomes an enabler — the binding dimension the body’s §VII rests on.

Mixture-of-experts sharpens this. Routing each token to a few of many experts is an all-to-all exchange across the devices holding them — the most topology-sensitive collective there is.[26] On training batches it is bandwidth-bound and the lattice helps little. At decode, one token at a time, the messages are small and it turns latency-bound. A master that is both enormous and recursive pays that latency on every expert hop of every step — the workload that binds hardest to a short-path lattice.

D. Geometry — The Three-Dimensional Lattice

A flat campus packs modules across a plane, so its diameter scales as DND \propto \sqrt{N}. A dense subsea lattice packs them through a volume, so DN1/3D \propto N^{1/3}. The geometric advantage:

D2DD3D=NN1/3=N1/6\boxed{\,\frac{D_\text{2D}}{D_\text{3D}} = \frac{\sqrt{N}}{N^{1/3}} = N^{1/6}\,}

At N=10,000N = 10{,}000 pods, N1/64.6N^{1/6} \approx 4.6. This is the packing advantage before overhead. Add the aisle, electrical-room, and service volume a land campus must carry (the pod has none of it; see §B), and the practical path-length advantage runs from roughly 5× to 20×.

ScalePodsLattice edge (≈ V1/3V^{1/3})Worst-case spanOne-way tpropt_\text{prop}
1 GW100~30 m70–90 m0.35–0.45 µs
10 GW1,000~65 m140–180 m0.7–0.9 µs

The lattice edge assumes ~240 m³ per pod (§B) plus an equal allowance for backplanes, structure, and flow channels, i.e. ~500 m³ per pod gross. A 1 GW block is then ~50,000 m³ of compute, a cube ~37 m on a side — the footprint of a single large building, holding a hundred times the compute of one such cluster. That single sentence is the whole geometric argument, and §B and §D are where it is earned.[41]

E. The Internal Thermal Loop and Pump Power

For a single-phase dielectric fluid (specific heat cp2,274Jkg1K1c_p \approx 2{,}274\,\mathrm{J\,kg^{-1}\,K^{-1}}, density ρf800kg/m3\rho_f \approx 800\,\text{kg/m}^3, a synthetic-hydrocarbon proxy) rejecting facility heat Q10.5MWQ \approx 10.5\,\text{MW} at a 10 K loop rise:[9][16]

m˙=QcpΔT=10.5×1062,274×10462kg/sV˙=m˙ρf0.58m3/s\dot{m} = \frac{Q}{c_p\,\Delta T} = \frac{10.5\times10^6}{2{,}274\times10} \approx 462\,\text{kg/s} \quad\Rightarrow\quad \dot{V} = \frac{\dot m}{\rho_f} \approx 0.58\,\text{m}^3/\text{s}

Pump power. Hydraulic power is Ppump=V˙ΔP/ηP_\text{pump} = \dot V\,\Delta P / \eta. The loop pressure drop ΔP\Delta P depends on architecture — a full-immersion bath runs low (~1.5 bar), a cold-plate loop higher (~3.5 bar):

Loop typeΔP\Delta PHydraulic powerAt η=0.65\eta = 0.65Fraction of 10 MW
Full immersion1.5 bar87 kW133 kW1.3%
Cold-plate3.5 bar203 kW312 kW3.1%

The seawater side adds its own pumping. Sizing the seawater flow for a 5 K rise gives m˙sw=Q/(cp,swΔT)526kg/s\dot m_\text{sw} = Q/(c_{p,\text{sw}}\Delta T) \approx 526\,\text{kg/s} (V˙0.51m3/s\dot V \approx 0.51\,\text{m}^3/\text{s}); at ~0.75 bar through the exchanger, ~59 kW, or 0.6%. Total pumping is therefore ~2–4% of IT load — and this is the dominant contributor to the facility-overhead budget in §L.[42]

F. The Hull Heat Exchanger — The Thermal Crux

Rejecting 10.5 MW into 20–22 °C water is the hardest steady-state problem in the design. The exchanger obeys[16][43]

Q=UAΔTlmA=QUΔTlm\boxed{Q = U A \Delta T_\mathrm{lm} \quad\Rightarrow\quad A = \frac{Q}{U \Delta T_\mathrm{lm}}}

The first-order design question is the temperature difference between the pod’s internal cooling loop and the surrounding seawater. A 35 °C bulk loop against ~20 °C seawater gives ΔTlm1215,K\Delta T_\mathrm{lm} \approx 12\text{–}15,\mathrm{K}, which is a conservative cold-loop case. Liquid-cooled compute can run warmer than this if the processors, memory, seals, connectors, dielectric fluid and reliability model are qualified for it. A more realistic concept range is a 45–55 °C hot loop, with 60 °C as an aggressive case.

For a clean seawater-to-fluid exchanger, using U8001,100,W,m2,K1U \approx 800\text{–}1{,}100,\mathrm{W,m^{-2},K^{-1}}, the required wetted area falls sharply as loop temperature rises:

Loop caseEffective ΔTlm\Delta T_\mathrm{lm}Area at Q=10.5,MWQ = 10.5,\mathrm{MW} and U=900,W,m2,K1U = 900,\mathrm{W,m^{-2},K^{-1}}
35 °C cold-loop case~14 K~830 m²
45 °C design case~24 K~490 m²
50 °C design case~29 K~400 m²
55 °C hot-loop case~34 K~340 m²
60 °C aggressive case~39 K~300 m²

The cylindrical hull of §B has an external area of roughly

AhullπDLπ(4.5)(15)210m2.A_\mathrm{hull} \sim \pi D L \approx \pi(4.5)(15) \approx 210\mathrm{m^2}.

So the pod still cannot reject 10 MW through a bare hull. But the exchanger penalty is less severe than the cold-loop case implies. With a qualified hot loop, the design needs roughly 1.5–4× the hull area in wetted, corrosion-resistant exchanger surface, rather than necessarily 3–5×. That surface can be folded into fins, keels, external exchanger panels, or a jacketed flow path around the capsule.

Biofouling is the dominant long-term degradation. Marine growth on the seawater side adds a fouling resistance RfR_f in series, so[44]

1Udirty=1Uclean+Rf\frac{1}{U_\mathrm{dirty}} = \frac{1}{U_\mathrm{clean}} + R_f

Using a 50 °C loop case with ΔTlm29,K\Delta T_\mathrm{lm} \approx 29,\mathrm{K}, fouling changes the exchanger area as follows:

RfR_f (m2K/W\mathrm{m^2K/W})ConditionUU from 1,100 cleanRequired areaPenalty
0pristine1,100~329 m²
0.0002light film902~401 m²+22%
0.0004moderate764~474 m²+44%
0.0008heavy579~626 m²+90%

At 200 m, reduced light suppresses photosynthetic growth, but microbial films and sessile marine organisms can still colonise surfaces. The design must therefore oversize the exchanger for end-of-cycle fouling, favour smooth high-flow geometries that shed growth, and tie the retrieval interval (§K) to fouling as much as to silicon refresh.

Check against known systems. The duty is not exotic by marine standards. Ships and power stations reject tens of megawatts to gigawatts into seawater through engineered exchangers and condensers. The pod’s exchanger is smaller than a power-station condenser and comparable in thermal duty to marine machinery. The hard part is not the heat-transfer equation. It is doing it sealed, fouled, retrievable, and reliable for years around frontier AI hardware.

G. The Thermal Plume and the Federation Limit

The far-field temperature rise of a heated discharge into a current of speed UU, mixing thickness HH, and plume width WW is, to first order,[45]

ΔTQρcpUHW\boxed{\,\Delta T \approx \frac{Q}{\rho\,c_p\,U\,H\,W}\,}

With seawater ρcp4.1×106J/m3K\rho c_p \approx 4.1\times10^6\,\text{J/m}^3\text{K}, U0.1m/sU \approx 0.1\,\text{m/s}, and H50mH \approx 50\,\text{m}, the denominator is 2.05×107W\approx 2.05\times10^7 \cdot W (in watts per kelvin), so ΔTQ/(2.05×107W)\Delta T \approx Q / (2.05\times10^7\,W). The plume width WW scales with the field footprint, roughly as areaP\sqrt{\text{area}} \propto \sqrt{P}:

FieldQQFootprint WWFar-field ΔT
1 GW compact~1.0 GW~300 m0.1–0.2 °C
10 GW compact~10.3 GW~1,000 m0.5–0.8 °C
100 GW compact~103 GW~3,000 m~2–4 °C

This is a deliberately first-order model — enough to answer the only question that matters at concept stage: does the system scale as one heat source or as a federation? International-finance environmental guidance commonly limits a discharge to a 3 °C rise at the edge of a defined mixing zone.[17] A 1 GW and a 10 GW campus clear that with spacing and current. A single compact 100 GW field crosses it. The conclusion is structural, not marginal: 100 GW must be built as ~ten 10 GW campuses spread along the coast, routed through higher-current sectors, and instrumented continuously. The ocean sets the size of the organ; the network (§C, §D) joins the organs into one mind.

H. Power Delivery — Solar Field, HVDC, and Tiered Firming

Solar sizing. Active module area for nameplate PPVP_\text{PV} at module efficiency ηm\eta_m and ISTC=1,000W/m2I_\text{STC} = 1{,}000\,\text{W/m}^2:

Aactive=PPVηmISTCA_\text{active} = \frac{P_\text{PV}}{\eta_m\,I_\text{STC}}

At ηm=0.25\eta_m = 0.25, every 1 GWp needs ~4 km² of module before spacing; a ~1.5× land factor covers tilt, rows, and access. Oman PV yield is ~1,900–2,000 kWh/kWp/year (~22% capacity factor).[19][46] Sizing the field at ~1.5× facility peak gives ~9 effective full-load hours per training day.

IT peakFacility peak (PUE 1.03)PV nameplate (1.5×)Active areaLand (1.5×)
1 GW1.03 GW1.55 GWp~6.2 km²~9.3 km²
10 GW10.3 GW15.5 GWp~62 km²~93 km²

HVDC export. End-to-end loss is dominated by the two converter terminals, not the short cable:

ηHVDC1(Lconv,shore+Lconv,sea+Lcable)\eta_\text{HVDC} \approx 1 - (L_\text{conv,shore} + L_\text{conv,sea} + L_\text{cable})

Modern VSC/MMC converters lose under 1% per terminal; over a 5–15 km subsea run the cable loss is ~0.1–0.3%.[20][47][48][49] Total end-to-end loss is ~1.5–2.5% — small enough that it sits inside the PUE budget rather than beside it.

Storage is tiered by workload value. The minimum system needs only ride-through storage. Because training can checkpoint and resume, a solar-following pod only needs enough battery capacity to ride through cloud transients and shut down cleanly:

EBESS=1.03GW×0.25h0.26GWh per GW.E_\mathrm{BESS} = 1.03\,\mathrm{GW} \times 0.25\,\mathrm{h} \approx 0.26\,\mathrm{GWh\ per\ GW}.

That is the proof-machine case. It minimises capital and demonstrates the pod, the thermal system, the subsea power interface, and the station scheduler.

The frontier case is different. If the pod contains current-generation or next-generation frontier silicon, night becomes expensive. A 1 GW field with roughly $15B of accelerators should not sit idle for fifteen hours if storage is cheaper than the lost productive time. A public round-the-clock solar benchmark is approximately 5.2 GW of PV and 19 GWh of batteries to deliver 1 GW continuously.[50] That is a useful current anchor, not the final Leviathan design.

By the time Leviathan reaches bankable scale, the relevant battery cost is not the 2026 cost. It is the 2031–2036 cost. Battery storage costs have already fallen sharply, and further reductions are expected as stationary storage becomes a larger share of global battery demand.[51][52] The architecture should therefore be designed around tiered firmness:

Pod classHardware agePower modelEconomic logic
FrontierYears 0–2Firmed solar plus storage / hybrid backupProtect the most expensive silicon and maximise iteration rate
Frontier-adjacentYears 2–4Partly firmed / scheduledRun high-value work first; defer what can wait
PlateauYears 4+Solar-followingCheap compute absorbs intermittency

The station does not choose between solar-following and round-the-clock operation. It allocates firmness by value. The newest pods get the night. Older pods follow the sun.

I. Duty Cycle, Scaling, and Effective Work

Daily delivered work scales the peak-compute multiple by duty cycle and realised programme efficiency. For a 1 GW campus at ~110× the reference cluster’s peak, 37.5% duty (9 h/24 h), and 70% efficiency after scheduling, checkpointing, and networking losses:

110×0.375×0.7029110 \times 0.375 \times 0.70 \approx 29

ScalePeak vs referenceSolar-adjusted daily work
1 GW~110×~25–35×
10 GW~1,100×~200–350×
100 GW~11,000×~1,000–3,000×

The solar-adjusted table is the low-storage case. It is useful because it proves that even a sun-following station can produce extraordinary throughput. It is not the only commercial configuration.

If frontier pods are firmed for round-the-clock operation, the duty-cycle multiplier changes. The limiting factor becomes realised programme efficiency rather than sunlight hours. A 1 GW station at ~110× the reference cluster’s peak and 70% realised programme efficiency would deliver roughly:

110×1.0×0.7077110 \times 1.0 \times 0.70 \approx 77

current frontier-cluster-days per day.

ScalePeak vs referenceSolar-following daily workFirmed-frontier daily work
1 GW~110×~25–35×~70–85×
10 GW~1,100×~200–350×~700–850×
100 GW~11,000×~1,000–3,000×~7,000–8,500×

These are projected operating envelopes. The correct dispatch depends on hardware age, workload value, storage cost, solar yield, customer urgency and the software scheduler. The economic point is simple: as battery costs fall and frontier silicon remains expensive, the newest pods increasingly justify firmed power. The older pods do not need it.

Honest distinction: strong vs weak scaling. The peak multiples are near-trivial — a hundred clusters have a hundred clusters’ compute. The interesting and contested quantity is the 70% efficiency factor, and it means two different things:

  • Weak scaling (many independent runs in parallel, or sparse/asynchronous workloads — MoE, self-play, synthetic-data generation, RL): the 70% is easy, the multiple approaches the raw peak, and the work does not depend on the subsea topology at all. You could run these anywhere.
  • Strong scaling (one model trained faster across the whole field): the 70% is doing enormous work, and sustaining it across ~500,000 GPUs is an extraordinary claim that lives or dies on the fabric of §C. This is the regime where the architecture’s coherence advantage matters — and where it must be proven, not assumed.

The body’s multiple is defensible for the mixed workload most labs actually run. The headline multiples at 10 GW and 100 GW are weak-scaling figures; read them as throughput, not as single-run speed-up.[41]

The second multiplier: software. Hardware throughput is one axis. Computational Abundance documents a second running in parallel: the compute needed to reach a fixed capability falls as the software improves, roughly threefold a year in pretraining and faster still in inference.[31] Over the five-year build of a station, that axis compounds into one to two further orders of magnitude of effective capability per delivered FLOP. The two multipliers are independent and they multiply. A station finishing a thousandfold more throughput, running software that extracts a hundredfold more capability per operation, is the arithmetic behind the body’s claim that 100 GW enters a range with no precedent to measure it against.

J. Marine Survivability — Pressure, Corrosion, Connectors, Retrieval

The pod must survive years sealed at 200 m. The question is whether the marine environment imposes anything novel. It mostly does not — the pressure regime is benign by subsea standards — but the connector and retrieval problems are real.

1. Hydrostatic pressure. At 200 m,

P=ρgh=1,025×9.81×2002.01×106Pa20bar.P = \rho g h = 1{,}025 \times 9.81 \times 200 \approx 2.01\times10^6\,\text{Pa} \approx 20\,\text{bar}.

Twenty bar is modest. A recreational scuba diver reaches ~6 bar at 50 m. Subsea oil-and-gas hardware routinely operates at 100–300 bar (1,000–3,000 m). The pod is best built pressure-compensated: the dielectric fluid is held at ambient pressure, so the hull sees almost no differential and merely contains the fluid, while the (nearly incompressible) fluid transmits 20 bar uniformly to the boards — which, with no air gaps, tolerate it. This is exactly the oil-filled-housing approach used for ROV electronics and proven by Microsoft’s Project Natick, which sealed servers in an inert atmosphere on the seabed and recorded roughly one-eighth the failure rate of an identical land deployment.[53][54]

2. Corrosion. Seawater plus dissimilar metals drives galvanic attack. Mitigations are standard marine practice: copper-nickel or titanium for wetted heat-exchanger surfaces, cathodic protection by sacrificial anodes, dielectric isolation of dissimilar metals, and anti-fouling coatings on smooth exterior geometry.[55][56] The hull refit interval (~8–10 years) is set by corrosion and fouling, decoupled from the faster IT-refresh interval (§K).

3. Wet-mate connectors. Each pod needs subsea power-in and fibre, mated and de-mated underwater by ROV. Subsea wet-mate connectors exist at the required power and data rates in oil-and-gas and subsea-power practice.[57][58] The gap is reliability expectation: subsea energy tolerates downtime that synchronous training does not. This is one of the two genuinely unproven subsystems and belongs on the gate list.

4. Retrieval mass. The pod is dominated by fluid mass: mfluidρfV800×240190tm_\text{fluid} \approx \rho_f V \approx 800 \times 240 \approx 190\,\text{t}, plus hull, exchanger, and hardware — call it 300–450 t. Recovery needs a heavy-lift or construction vessel with ROV support, not a barge winch.[59] The operating model is retrieve-to-service, not repair-in-place: lift the pod, refit ashore, redeploy.

5. The single-point catastrophe land has no analogue for. A hull breach floods the pod and destroys ~5,000 GPUs at once — a failure mode no land datacentre carries. Mitigation is compartmentalisation, continuous leak and pressure monitoring, and conservative seal/penetrator design. It cannot be eliminated, only bounded, and it must be priced into the reliability model.

K. Reliability and the Refresh Cascade

A subsea pod should not be treated as a static asset. It is a modular compute organ with a lifecycle.

The vessel, exchanger, connectors, power interface and subsea frame are durable infrastructure. They should be designed for a five-to-seven-year deployment cycle, with retrieval driven by fouling, corrosion, seal life, connector qualification and planned refit. The accelerator stack inside the pod lives on a faster clock. Frontier hardware turns over every few years. A pod that is frontier-class on the day it is lowered will not remain the newest machine forever.

That is the operating model.

The pod should migrate down the workload hierarchy as it ages.

Pod ageLikely roleWorkload
Years 0–2Frontier / premiumApex training, dense post-training, high-value synthetic data, urgent frontier experiments
Years 2–4Frontier-adjacentReinforcement learning, distillation, architecture search, fine-tuning, simulation, evaluation
Years 4+PlateauBatch inference, student models, simulation farms, data generation, scientific workloads, latency-tolerant compute
RefitRenewalRetrieve, clean, inspect, replace silicon, redeploy

This resolves the apparent contradiction between silicon cadence and subsea service life. Leviathan does not need every pod to remain frontier for its whole deployment. It needs the station to remain productive as a portfolio.

New pods carry the frontier. Older pods carry the plateau. The station becomes a living cascade.

That model fits the architecture better than a forced two-year retrieval cycle. It reduces vessel time, lowers operating disruption, extends useful asset life, and lets hardware depreciation become a feature rather than a failure. The same thing that happens across the wider economy in Computational Abundance[31] happens inside the station itself: the frontier falls, the plateau catches it, and useful intelligence continues to run.

The key design implication is that Leviathan should be built for mixed workloads from the beginning. Its software scheduler must assign work by pod generation, network quality, thermal margin, customer priority and hardware capability. Its commercial model must price the station as a tiered fabric, not as a uniform block of identical capacity.

The remaining risk is reliability. Large accelerator fleets fail.[60] A pod sealed for years cannot be maintained like a land rack. It must carry spare capacity, reconfigurable networking, graceful degradation, leak detection, thermal monitoring and workload migration. The pilot must measure the true failure rate under immersion, pressure, heat and continuous AI load. That number is still unknown.

But the lifecycle answer is clear.

L. Facility Efficiency (PUE), Built Up From the Parts

PUE is total facility energy divided by IT energy.[61] Unlike the body’s comparison table, this figure can be constructed from the loss terms derived above:

Loss termSourceFraction of IT load
Internal + seawater pumping§E2.0–4.0%
HVDC export (sea side share)§H~0.5–1.0%
Power conversion / distribution in podest.1.0–1.5%
Monitoring, controls, auxiliariesest.~0.3%
Total overhead~3.8–6.8%
Implied PUE1.04–1.07

So the 1.03–1.05 target is achievable but sits at the optimistic edge. 1.05 is the safe planning anchor.[62][63] Against land baselines this yields:

Land baselineSubsea (1.05)More IT compute per facility MW
1.201.05~14%
1.151.05~9.5%
1.101.05~4.8%
1.35 (legacy air)1.05~29%

The honest baseline is land immersion, which already reaches ~1.05–1.10.[39][64] Against that, the efficiency edge is small. The durable wins are the ones efficiency tables do not show: zero water consumption and a heat sink of effectively unlimited capacity.[12]

M. Energy and Capital Floor — Silicon Still Dominates

Per gigawatt, the bill of materials is dominated by the accelerators, not the marine infrastructure. A 1 GW field is 100 pods × ~5,000 GPUs = ~500,000 accelerators:

ComponentCost basisPer 1 GW
Accelerators~500,000 × ~$30k~$15B
Solar fieldscaled utility PV, 1.5×~$1.0–1.5B
HVDC + converters + collectorshort route~$0.5–1.0B
Pods, frames, deployment, wet plant (ex-IT)parametric~$0.8–1.5B
15-min BESS (§H)IRENA ~$192/kWh~$0.05B
Total ex-silicon~$2.4–4.0B

Silicon is therefore ~79–86% of capital.[51][41] The marginal energy cost is smaller still: one pod consumes 10MW×9h×36533GWh/yr10\,\text{MW}\times 9\,\text{h}\times 365 \approx 33\,\text{GWh/yr}, ~$1.0M/yr at $0.03/kWh solar — under 1% per year of the ~$150M of silicon it contains. The conclusion mirrors Bridge to Infinity’s lunar arithmetic in reverse:[65] the architecture does not make intelligence cheap. It changes where the most expensive machine in the world can physically exist. The solar-and-sea system removes the grid, water, and land constraints.

We’re basing this on a 2026 concept snapshot. The dollar cost of one electrical gigawatt of silicon may not fall quickly, because each new generation of accelerator is both more powerful and more expensive. But the effective cost of computation per gigawatt should fall sharply. The relevant denominator is not only dollars per installed watt. It is training work per dollar, per watt, and per year. On that basis the architecture improves with time: denser racks, better FLOP/W, cheaper solar, cheaper short-duration storage, and compounding software efficiency all make each future gigawatt more productive than the one before it.

N. Siting

The Gulf of Oman shelf varies sharply. The broad Al-Batinah shelf near Sohar reaches 200 m only ~25 km offshore, lengthening the power cable and weakening the case. The narrow shelf at Sur / Ra’s al Hadd reaches 200 m within ~5 km — the strongest candidate for short export runs and compact fields.[13] Intermediate-depth water along the slope carries the warm, saline Arabian Gulf outflow (~20–22 °C, the §F design point),[14][66][67][15] and sits within an oxygen-sensitive regime that makes continuous local thermal and ecological monitoring non-negotiable.[68][18]

The gate list. Pulling the weak links together, an honest programme proves these in order before scaling, and no gate assumes the one above it: continuous 10 MW rejection into 20 °C water with managed fouling (§F); safe plume behaviour (§G); wet-mate connector reliability at training-uptime expectations (§J.3); measured in-pod failure rate and graceful degradation (§K); clean retrieve-and-refit (§J.4); and a resolved answer to the frontier-versus-plateau positioning (§K). The physics works at concept level. The programme is what turns it into a machine.


References

[1] Pilz, K.F., Sanders, J., Rahman, R. & Heim, L. (2025). “Trends in AI Supercomputers.” arXiv:2504.16026. (500-system dataset, 2019–2025. Finds AI-supercomputer performance doubled every nine months while hardware cost and power doubled annually; extrapolates a leading 2030 system requiring ~9 GW. Load-bearing source for the power-per-frontier-run model in §A.)

[2] Sevilla, J., Heim, L., Ho, A., Besiroglu, T., Hobbhahn, M. & Villalobos, P. (2022). “Compute Trends Across Three Eras of Machine Learning.” arXiv:2202.05924. (Finds that in the large-scale era, training compute doubled roughly every six months. Supports the claim that frontier compute has grown far faster than hardware efficiency.)

[3] Shankar, S. & Reuther, A. (2022). “Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications.” arXiv:2210.17331. (Reviews slowing energy-efficiency gains from geometric scaling. Supports the §A claim that efficiency gains cannot indefinitely rescue frontier compute growth.)

[4] Koomey, J., Berard, S., Sanchez, M. & Wong, H. (2011). “Implications of Historical Trends in the Electrical Efficiency of Computing.” IEEE Annals of the History of Computing 33(3), 46–54. (Foundational source for long-run computations-per-joule improvement. Background for the efficiency denominator in §A.)

[5] Dennard, R.H. et al. (1974). “Design of Ion-Implanted MOSFET’s with Very Small Physical Dimensions.” IEEE Journal of Solid-State Circuits 9(5), 256–268. (Original Dennard-scaling paper. Background for the claim that historical voltage/power-density scaling has ended.)

[6] U.S. Energy Information Administration. “Electric Power Annual.” (Source for U.S. electricity generating capacity and generation. For the comparison between a tens-of-gigawatt training run and national generating capacity.)

[7] International Energy Agency (2025). “Energy and AI.” (Global report on electricity demand from data centres and AI. Sources broad claims about data-centre electricity growth, grid pressure, and the concentration of AI loads.)

[8] NVIDIA. “NVIDIA GB300 NVL72.” (Official product page for the liquid-cooled GB300 NVL72 rack-scale architecture: 72 Blackwell Ultra GPUs, 36 Grace CPUs, 130 TB/s NVLink, 37 TB fast memory, 1,440 PFLOPS FP4. Load-bearing source for the 72-GPU rack-class unit.)

[9] Pambudi, N.A. et al. (2022). “The immersion cooling technology: Current and future development in energy saving.” Alexandria Engineering Journal 61(12), 9509–9527. (Review of single-phase and two-phase immersion cooling, dielectric liquids, and heat rejection. Sources the dielectric-fluid concept.)

[10] NVIDIA Blog (2025). “Microsoft Azure Unveils World’s First NVIDIA GB300 NVL72 Supercomputing Cluster for OpenAI.” (Reports a 4,608-GPU GB300 NVL72 Azure cluster, i.e. 64 rack-scale systems, with 800 Gb/s-per-GPU Quantum-X800 InfiniBand. The reference cluster for the “one 10 MW pod ≈ one current top-tier cluster” comparison.)

[11] LiVecchi, A. et al. (2019). “Powering the Blue Economy: Exploring Opportunities for Marine Renewable Energy in Maritime Markets.” U.S. Department of Energy. (Marine-energy opportunities, including maritime markets and data-centre concepts. Context for offshore power/cooling integration.)

[12] Mytton, D. (2021). “Data centre water consumption.” npj Clean Water 4, 11. (Review of data-centre water consumption and water-use metrics. Supports the zero-water-consumption claim.)

[13] GEBCO Compilation Group. “GEBCO Gridded Bathymetry Data.” (Global gridded bathymetric dataset. Required source for any claim about how quickly the Oman shelf reaches 200 m near Sur / Ra’s al Hadd.)

[14] NOAA National Centers for Environmental Information. “World Ocean Atlas.” (Global climatology of ocean temperature and salinity at standard depths, including 200 m. Required source for Gulf of Oman water-temperature assumptions.)

[15] Xue, P. & Eltahir, E.A.B. (2015). “Estimation of the Heat and Water Budgets of the Persian Gulf Using a Regional Climate Model.” Journal of Climate 28(13), 5041–5062. (Supports the claim that the Persian/Arabian Gulf is highly evaporative and exports warm, saline water through Hormuz, relevant to Oman intermediate-water assumptions.)

[16] Incropera, F.P., DeWitt, D.P., Bergman, T.L. & Lavine, A.S. “Fundamentals of Heat and Mass Transfer.” Wiley. (Standard heat-transfer reference for Q=UAΔTlmQ = UA\,\Delta T_\text{lm}, convection coefficients, and thermal-resistance networks. Supports the thermal-loop and heat-exchanger modelling in §E§F.)

[17] World Bank Group / IFC (2007). “Environmental, Health, and Safety Guidelines: General EHS Guidelines.” (International-finance environmental guidance for the regulatory constraint — thermal-discharge and mixing-zone limits — as distinct from the modelling method in [45]. Use with site-specific local regulation; not a substitute for Omani permitting.)

[18] Queste, B.Y., Vic, C., Heywood, K.J. & Piontkovski, S.A. (2018). “Physical Controls on Oxygen Distribution and Denitrification Potential in the North West Arabian Sea.” Geophysical Research Letters 45(9), 4143–4152. (Regional source for oxygen-minimum-zone and ventilation dynamics in the Gulf of Oman region. Supports the environmental caution around oxygen-sensitive waters.)

[19] World Bank / ESMAP / Solargis. “Global Solar Atlas.” (Solar-resource and PV-output mapping tool. Required source for Oman PVOUT values, solar-field sizing, and location-specific yield assumptions.)

[20] Ardelean, M. & Minnebo, P. (2015). “HVDC Submarine Power Cables in the World: State-of-the-Art Knowledge.” European Commission Joint Research Centre. (Technical reference for submarine HVDC cable systems, route lengths, voltage classes, and deployment experience. Supports subsea HVDC feasibility.)

[21] Lian, X. et al. (2024). “Universal Checkpointing: A Flexible and Efficient Distributed Checkpointing System for Large-Scale DNN Training with Reconfigurable Parallelism.” arXiv:2406.18820. (Shows how large-scale training can checkpoint and resume across changing model-parallel configurations. Supports the solar-following/checkpointable-workload argument.)

[22] Wan, B. et al. (2024). “ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development.” arXiv:2407.20143. (Industrial checkpointing system reporting up to 54× reductions in checkpoint stalls. Supports the claim that checkpointing is a real system concern, not a rhetorical convenience.)

[23] Li, S. et al. (2020). “PyTorch Distributed: Experiences on Accelerating Data Parallel Training.” arXiv:2006.15704. (Explains distributed data-parallel training, gradient communication, bucketing, and overlap of communication with computation. Supports the synchronous-training and all-reduce discussion in §C.)

[24] Sergeev, A. & Del Balso, M. (2018). “Horovod: fast and easy distributed deep learning in TensorFlow.” arXiv:1802.05799. (Describes ring-allreduce distributed deep learning and the communication overheads of scaling from one GPU to many. Supports the all-reduce model and the latency/bandwidth distinction.)

[25] Wang, G. et al. (2023). “ZeRO++: Extremely Efficient Collective Communication for Giant Model Training.” arXiv:2306.10209. (Shows that communication volume from weight-gathering, the backward pass, and gradient averaging limits throughput at scale. Supports the §C caution that bandwidth fabric remains load-bearing.)

[26] Fedus, W., Zoph, B. & Shazeer, N. (2022). “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.” arXiv:2101.03961. (Sparsely-activated mixture-of-experts model selecting different parameters per input at roughly constant compute, with up to 7× pretraining speed-ups. Sources the §VII claim that the newest models are sparse, with the total network growing while active computation stays small.)

[27] Shazeer, N. et al. (2017). “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.” arXiv:1701.06538. (Foundational conditional-computation paper: increases model capacity by over 1000× with only minor losses in computational efficiency. Sources the mixture-of-experts mechanism behind sparse frontier models.)

[28] Wang, Y. et al. (2022). “Self-Instruct: Aligning Language Models with Self-Generated Instructions.” arXiv:2212.10560. (Bootstraps instruction data from a model’s own generations, then finetunes on it. Sources the §VII claim that synthetic data produced in bulk is now a standard part of the frontier’s second economy.)

[29] Geiping, J. et al. (2025). “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” arXiv:2502.05171. (A model that iterates a recurrent block, unrolling to arbitrary depth at test time. Sources the §VII description of reasoning that loops a small block through depth, and the serial, latency-bound character of recursive inference in §C.)

[30] Snell, C., Lee, J., Xu, K. & Kumar, A. (2024). “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.” arXiv:2408.03314. (Shows that allocating compute at inference can outperform a much larger model, establishing test-time reasoning as a scaling axis distinct from parameter and pretraining scale. Sources the §VII claim that the frontier is learning to scale the depth of reasoning before answering.)

[31] Age of Wonders. “Computational Abundance.” (Internal cross-reference. Sets out the slot hierarchy — frontier capability discovery, plateau diffusion, edge access — and the depreciation logic Leviathan builds on.)

[32] OpenAI (2025). “Announcing The Stargate Project.” (Primary announcement for the planned USD 500 billion AI-infrastructure programme. Sources the claim that AI infrastructure is moving from ordinary cloud capacity toward national-scale compute buildouts.)

[33] Reuters (2025). “OpenAI, Oracle, SoftBank plan five new AI data centers for $500 billion Stargate project.” (Reports Stargate’s 10 GW target and nearly 7 GW / over USD 400 billion of announced site commitments. Supports the claim that frontier AI infrastructure is already discussed in gigawatt units.)

[34] Reuters (2025). “Vantage Data Centers plans $25 billion AI campus in Texas.” (Reports Vantage’s 1.4 GW, 1,200-acre Frontier campus in Texas with ten data centres. Land-campus comparator for the city-scale/sprawl argument.)

[35] Cottier, B., Rahman, R., Fattorini, L., Maslej, N., Besiroglu, T. & Owen, D. (2024). “The rising costs of training frontier AI models.” arXiv:2405.21015. (Cost models for frontier-model training; estimates amortised training cost grew about 2.4× per year since 2016. Links compute scaling to capital intensity.)

[36] Shehabi, A. et al. (2024). “2024 United States Data Center Energy Usage Report.” Lawrence Berkeley National Laboratory. (U.S. data-centre electricity-use estimates and projections. For national-scale load comparisons and conventional data-centre energy context.)

[37] NVIDIA. “NVIDIA Vera Rubin NVL72.” (Official product page for the next-generation Vera Rubin NVL72 architecture. Supports the thesis that post-bust hardware may be denser, more power-efficient and more rack-coherent, strengthening the topology argument.)

[38] Open Compute Project. “Advanced Cooling Solutions.” (Industry working group and specification hub for data-centre liquid cooling, including immersion and cold-plate approaches. Grounds liquid-cooling terminology and requirements.)

[39] Haghshenas, K., Setz, B., Bloch, Y. & Aiello, M. (2022). “Enough Hot Air: The Role of Immersion Cooling.” arXiv:2205.04257. (Compares air and immersion cooling on energy, PUE, power density, cost and maintenance. Supports claims that immersion enables higher densities but creates maintenance and reliability trade-offs.)

[40] NVIDIA. “NCCL User Guide: Collective Operations.” (Official documentation for all-reduce, reduce-scatter, all-gather and all-to-all. Technical support for the communication-model section.)

[41] Atlas / Leviathan internal model. No external link. (The 2D-versus-3D geometry model, D2D/D3D=N1/6D_{2D}/D_{3D} = N^{1/6}, the 10 MW pod-to-1 GW/10 GW scaling tables, the solar-adjusted frontier-cluster-day calculations, the first-order plume model, and the facility economics are original modelling assumptions. They are derived estimates, not sourced facts. The sources above support the inputs; they do not independently validate the full Leviathan architecture.)

[42] Jadhav, S. & Liu, Z. (2026). “Digital Twin-Based Cooling System Optimization for Data Center.” arXiv:2603.01198. (Uses operational data from the liquid-cooled Frontier supercomputer cooling system to model and optimise pumping and supply-temperature control. Supports the §E and §L pump-power and facility-overhead reasoning.)

[43] Shah, R.K. & Sekulić, D.P. (2003). “Fundamentals of Heat Exchanger Design.” Wiley. (Reference for sizing, log-mean temperature difference, fouling resistance, overall heat-transfer coefficients, and compact exchanger design.)

[44] Bott, T.R. (1995). “Fouling of Heat Exchangers.” Elsevier. (Reference on fouling mechanisms and fouling resistance. Supports the §F treatment of biofouling as the long-term thermal-performance risk.)

[45] U.S. Environmental Protection Agency (2024). “PLUMES2.0 — Dilution Model.” See also the PLUMES2.0 Model Theory and User Manual and the legacy Visual Plumes (4th edition). (EPA’s current public plume/dilution model for surface-water effluent discharges. It computes near-field dilution governed by source buoyancy, momentum, and ambient currents, then Brooks far-field dilution as the plume is carried by turbulence and currents — and is used to set NPDES mixing zones. Confirms that the §G estimates are concept-stage screening calculations, not permit-grade modelling, where real dilution depends on discharge flow, density, diffuser geometry, depth, stratification, and currents.)

[46] ESMAP / World Bank (2019). “Global Solar Atlas 2.0: Technical Report.” (Methodology for Global Solar Atlas irradiance and PVOUT data. Supports the use of GSA for concept-stage solar modelling.)

[47] Wikipedia. “High-voltage direct current.” (Secondary overview for typical HVDC loss ranges and the submarine-cable advantage of DC over AC. Prefer JRC / CIGRÉ / vendor sources for formal work.)

[48] Worzyk, T. (2009). “Submarine Power Cables: Design, Installation, Repair, Environmental Aspects.” Springer. (Core engineering reference for submarine power-cable design, installation, repair, and environmental considerations.)

[49] IEC. “IEC 62895: High voltage direct current (HVDC) power transmission — Cables with extruded insulation and their accessories.” (Standards reference for HVDC cable systems. For later engineering diligence, not headline claims.)

[50] Reuters (2026). “How falling battery costs are igniting race for round-the-clock solar power.” (Reports Masdar’s UAE project using ~5.2 GW of solar and ~19 GWh of batteries to deliver 1 GW of round-the-clock power. Benchmark for firmed-solar sizing and the night-storage-versus-ride-through comparison in §H.)

[51] International Renewable Energy Agency (2025). “Renewable Power Generation Costs in 2024.” (Source for solar-PV and battery-cost trends. Supports solar-field and BESS cost assumptions in §H and §M.)

[52] International Energy Agency (2024). “Batteries and Secure Energy Transitions.” (Battery storage costs are projected to fall substantially by 2030 as stationary storage becomes a larger share of global battery demand. Supports the §H/§M case for modelling 2031–2036 storage economics, not only 2026 costs.)

[53] Microsoft Research. “Project Natick.” (Primary Microsoft Research site for subsea data-centre experiments. Use carefully: Natick supports subsea feasibility at hundreds of kilowatts, not 10 MW AI pods.)

[54] Microsoft Source (2020). “Microsoft finds underwater datacenters are reliable, practical and use energy sustainably.” (Retrieval and assessment of the Natick Phase 2 module after two years at 117 ft, with 864 servers showing one-eighth the failure rate of a land control group. Reliability and sealed-environment precedent.)

[55] DNV. “Recommended practices and standards for offshore/subsea systems.” (DNV standards ecosystem for offshore and subsea reliability, corrosion, qualification, and inspection. A diligence pointer rather than a single-claim source.)

[56] AMPP (formerly NACE). “Standards for corrosion control.” (Standards source for marine corrosion, cathodic protection, and materials selection. Supports the §J corrosion-control gate.)

[57] Teledyne Marine. “Subsea Connectors.” (Commercial reference for wet-mate and subsea connector families. Shows subsea power/data connectors exist, without proving Leviathan-level reliability.)

[58] Siemens Energy. “Subsea Power Grid.” (Offshore reference for subsea power-distribution architecture. Analogue for the wet electrical infrastructure, though oil-and-gas uptime expectations differ from synchronous AI training.)

[59] DNV. “Rules and Standards.” (Certification and offshore-engineering standards. For later vessel, lifting, subsea-frame, pressure-envelope, and operational-safety diligence.)

[60] Uptime Institute. “Annual Outage Analysis.” (Reference for conventional data-centre reliability and outage context. Frames the difference between data-centre uptime and sealed subsea pod reliability.)

[61] The Green Grid. “PUE: A Comprehensive Examination of the Metric.” (Defines and contextualises Power Usage Effectiveness. For comparing land PUE, immersion PUE, and the constructed facility-overhead model in §L.)

[62] Green500 / TOP500. “Green500 List.” (Tracks measured energy efficiency of high-performance computing systems. Background for FLOP/W and system-efficiency comparisons, with care for AI accelerators and training workloads.)

[63] ASHRAE. “Datacom Series.” (Thermal-guidance series for data centres and IT equipment environments. For later thermal design, liquid-cooling, and operating-envelope diligence.)

[64] Oak Ridge National Laboratory. “Frontier.” (Reference point for liquid-cooled exascale HPC facility design and high-density compute operations. Analogue for facility cooling, not AI-specific cluster topology.)

[65] Age of Wonders. “Bridge to Infinity.” (Internal cross-reference for the launch-economics arithmetic mirrored in reverse in §M: the architecture changes where the most expensive machine can exist, not what it costs to build.)

[66] Copernicus Marine Service. “Copernicus Marine Data Store.” (Operational and reanalysis ocean datasets for temperature, salinity, and currents. For site-specific Oman thermal-plume, current, and seasonal-variability modelling.)

[67] Pous, S., Lazure, P. & Carton, X. (2015). “A model of the general circulation in the Persian Gulf and in the Strait of Hormuz.” Continental Shelf Research 94, 55–70. (Regional-circulation model for the Persian Gulf and Strait of Hormuz. Background for saline outflow into the Gulf of Oman.)

[68] Argo Programme. “Argo.” (Global network of profiling floats measuring ocean temperature and salinity. For checking World Ocean Atlas and Copernicus profiles near Gulf of Oman candidate sites.)

Explore More

Read new essays exploring abundance, access, and the Age of Wonders ahead.