The $7b NVIDIA deal you’ve probably never heard about... but it’s powering the entire ai revolution

how mellanox's invisible tech is fueling the ai arms race

Dec 15, 2024

This is the go-to newsletter and community for no-code AI tools, news and productivity insights.

Most people think NVIDIA = GPUs.

But here’s the twist: modern AI training isn’t just a GPU game—it’s a networking game.

Here’s why:
A single GPU, even a powerhouse like the A100, maxes out at ~50 billion parameters. But today’s AI models, like GPT-4, have trillions.

Training them means splitting the load across thousands of GPUs—each syncing and sharing data constantly. Without the right networking tech, everything grinds to a halt.

Enter Mellanox.

In 2019, NVIDIA acquired Mellanox for $7 billion. It flew under the radar at the time, but it might be one of the most strategic acquisitions in tech history.

Here’s why Mellanox is the unsung hero:
☑ RDMA (Remote Direct Memory Access): Allows GPUs to access memory on other machines without CPU bottlenecks.
☑ Infiniband: Cuts latency by 2-3x compared to Ethernet (100ns vs 200-400ns)—critical for syncing gradients across GPUs.
☑ GPUDirect RDMA: Enables GPUs to communicate directly with network cards, slashing latency by another 30%.

NVIDIA didn’t just buy Mellanox—they built it into their DNA:
→ Mellanox’s ConnectX NICs are baked into NVIDIA’s DGX systems.
→ NVIDIA optimized the entire stack: GPUs, NICs, switches, and drivers work in perfect harmony.

The results? Unmatched performance:

HDR Infiniband: 200Gb/s per port
Quantum-2 switch: 400Gb/s per port
End-to-end latency: ~100ns
GPU memory bandwidth: ~900Gb/s

This integration allows NVIDIA to dominate AI training at scale.

Meanwhile, the competition is floundering:
→ Intel scrapped its Omni-Path project.
→ Broadcom and Ethernet lag in latency.
→ Cloud providers are stuck with RoCE (and its limitations).

Looking ahead, NVIDIA has its sights set on:

Tighter GPU-NIC integration (think CXL + Mellanox).
Sub-50ns latency and terabit-per-second bandwidth.
AI-first networks purpose-built for tomorrow’s workloads.

The takeaway?
While GPUs get all the glory, Mellanox is the silent kingmaker behind every breakthrough. NVIDIA didn’t just buy a networking company—they bought the future of AI.

Next time you marvel at a cutting-edge language model, remember this:
The GPUs may be the stars of the show, but Mellanox is the stage they perform on.

Sometimes, the most important tech is the one you don’t see.

See you next week!

Cheers,

Jagger

AI the boring

Discussion about this post