How AI Factories Actually Work

Inside Nvidia’s Multi-Billion-Dollar Infra Play

Jun 21, 2025

Welcome to Infinite Curiosity, a newsletter that explores the intersection of Artificial Intelligence and Startups. Tech enthusiasts across 200 countries have been reading what I write. Subscribe to this newsletter for free to directly receive it in your inbox:

Nvidia is turning the cloud data center into an “AI factory”. It’s a line where raw data rolls in and trained models roll out. CEO Jensen Huang’s pitch is simple: treat compute like steelmaking. You can stamp out identical plants and own the recipes.

Nvidia AI factory and DGX Cloud are no longer just concepts. Billions in gear is already humming in Oregon, Frankfurt, and Tokyo. The stakes are geopolitical too. Whoever masters these AI supercomputers sets the price of compute.

1. Blackwell: the furnace at the core

The factory’s heart is Nvidia’s Blackwell platform, led by the GB200 Grace-Blackwell Superchip and the NVL72 rack. One cabinet (NVL72) packs 72 of the latest B200 chips. All tied together with ultra-fast wiring and smart network cards. The result is that it runs LLM workloads about 30 times faster than an H100 setup while using roughly 25% less energy for every word it generates.

This gives engineers a single 1.4 TB/s memory space for trillion-parameter checkpoints. Software matters too. CUDA 12, TensorRT-LLM, and the NeMo Retriever stack form the OS of model production.

2. DGX Cloud: renting slices of the line

If Blackwell is the furnace, DGX Cloud is the utility hookup. The managed service drops virtual DGX nodes onto Oracle, Azure, GCP. And since June 2025, DGX Cloud Lepton too. It federates regional GPU clouds into “planet-scale” capacity.

Same CUDA, same NeMo stack. But only the invoice changes. Developers click once and inherit observability, checkpoint sharding, and model versioning.

3. The production run inside an AI factory

The flow mirrors auto manufacturing. Data prep → pre-training → RLHF → safety eval → deployment.

A lightning-fast storage network streams data straight into the GPUs. And a special switch keeps every chip’s learning updates in lockstep. Dozens of 72-GPU racks are then wired together over ultra-high-speed InfiniBand to form a single cluster of 20,000 GPUs. Essentially one massive AI supercomputer.

During inference, traffic is routed between low-latency and high-throughput clusters. And orchestrated by Kubernetes microservices. NeMo Guardrails validates outputs before they ship.

4. Economics: why factories beat hobby rigs

Capex runs about $1M per 128 B200 GPUs plus 30% for power and cooling. But the payoff is monopoly throughput. Germany’s new industrial AI factory will field 10,000 GPUs for Siemens and friends.

At $0.02 per 1k tokens, that cluster could clear billions in margin while depreciating in 3 years. Nvidia skims rent on every layer: silicon, software, networking, and DGX Cloud seats. The moat widens as operators lock renewable PPAs below $0.05/kWh.

5. What this means for builders and investors

Founders must decide buy vs rent. Own a rack if you need deterministic performance, latency guarantees, or strict data residency. Everyone else should treat DGX Cloud as the AWS of 2025. Default until scale justifies solder.

Investors should follow the heat map. Whoever secures cheap electricity and tax breaks will win the next decade of AI infra. The real story is capital, cooling, and CUDA licensing. And Nvidia owns them all.

Nvidia’s AI factories convert electrons into intelligence at industrial scale. And DGX Cloud hands you the keys if you can afford it.

If you are getting value from this newsletter, consider subscribing for free and sharing it with 1 friend who’s curious about AI:

Infinite Curiosity Newsletter

Discussion about this post