The all liquid-cooled colo facility rush has begun

originally published in The Register

AI and HPC deployments means propping up 250kW densities per rack

With AI and HPC workloads becoming the norm, we can expect a broader push toward high-end power and cooling technologies inside colo facilities.

Companies like Colovore point to next-generation liquid-cooled colocation datacenters like a new one planned for Santa Clara, California, which will support even more powerful systems with rack densities of up to 250kW when it comes online in 2024.

The new facility, located alongside its existing Santa Clara location, will be the company’s latest datacenter specifically designed with power hungry and thermally challenging high-performance compute (HPC) and AI/ML workloads in mind.

“When we opened our doors in 2013, touting 20kW per cabinet in our phase one, many thought power density wasn’t a big issue at the time,” Colovore President Sean Holzknecht said in a statement. “We now support thousands of AI and GPU systems for Fortune 500 companies down to Silicon Valley startups in hundreds of cabinets, each drawing 15-50kW per cab.”

The DGX A100, for instance, crams two 225W 64-core AMD Eypc 2 processors, eight 400W A100 SXM GPUs, nine 200 Gbit/sec NICs, and a terabyte of system memory into a single 5U system that can chug down 6.5kW of power under full load. Nvidia’s upcoming 8U DGX H100 systems, due out early next year, take that to an even greater extreme at 10.2kWs.

Assuming a standard rack height, Colovore’s 50kW rack budget could support four 10kW systems and still have just enough room for networking. And for customers using direct liquid cooling, the company says it can support rack power budgets up to 250kWs. That would work out to just under 6kW per rack unit.

“Our high-density datacenters allow customers to pack their racks full from top to bottom due to the robust power and cooling infrastructure we deliver. This reduces the total amount of space required, resulting in far lower monthly operating costs and capex, while significantly increasing IT operating efficiency and scalability,” Colovore co-founder Ben Coughlin said.

To achieve these densities, Colovore is using a combination of direct-to-chip liquid cooling (DLC) — a technology that’s becoming increasingly common among server OEMs to pack hotter components into smaller chassis — and rear-door heat exchangers.

DLC systems are rather straightforward, trading large, heavy copper heat sinks and high-pressure fans for low-profile cold plates and tubing to carry coolant to each of the components. This allows systems to be built far more densely than traditional air-cooled systems, while removing the fans can reduce power consumption by 13-20 percent.

Heat from these systems are carried to rear-door heat exchangers, which are comprised of several large fans and a radiator — not unlike the one you’d find in a car — that attaches to the back of the server rack.

Colovore claims its approach allows it to operate at a power usage efficiency (PUE) of 1.1, a metric that describes how much of the power used by a datacenter actually goes toward compute, storage, or networking equipment. The closer the PUE to 1.0, the more efficient the facility.

The colo company expects to begin leasing space in the facility starting in the first quarter of 2024, just in time for a new generation of high-wattage chips to reach volume production. ®