by Sebastian Moss
The first thing everyone mentions about Cerebras is its size. Throughout multiple interviews with DCD, the company’s CEO tried to draw our focus to other potential benefits of the chip architecture and how the startup plans to build a sustainable artificial intelligence business. And yet, inexorably, try as we might, we kept coming back to the size of its chips. The world’s largest chip, the Wafer Scale Engine 2, has 2.6 trillion transistors – significantly more than Nvidia’s top-of-theline H100 GPU, which clocks in at 80bn. Built on TSMC 7nm, the WSE-2 has 850,000 ‘AI optimized’ cores, 40GB of on-chip SRAM memory, 20 petabytes of memory bandwidth, and 220 petabits of aggregate fabric bandwidth.
For those that can afford it, it can be bought as the Cerebras CS-2, a 15U box that also includes HPE’s SuperDome Flex and AMD CPUs for a peak sustained system power of 23kW. “It’s a million-and-a-half dollars, plus or minus,” CEO Andrew Feldman said.
But we didn’t fly to Santa Clara to see something in the paltry single seven figures.
We’re here to see the super-powerful computer Cerebras has constructed from multiple CS-2 systems. We are here to see Andromeda.
With 16 CS-2s, 18,176 AMD Epyc Gen 3 cores, and some 13.5 million cores, Andromeda is one of the world’s most powerful supercomputers – at least on single precision AI benchmarks, where it posts more than one exaflop of compute.
Cerebras offers Andromeda as a cloud service. Feldman says some customers will use the service test out the unique architecture, before going ahead with a larger purchase (more on that later). Others will use it as an ongoing cloud platform in lieu of buying their own gear.
Andromeda is the beginning of an audacious plan to grab a slice of the exploding AI market, and its data center host also sees this as an opportunity. The supercomputer sits in a colocation facility run by Colovore. After years as the niche operator of a single facility, Colovore sees the Cerebras system as a critical inflection point, where the high-density requirements of AI workloads will finally shift the data center industry over to liquid cooling.
Colovore hopes to spread across the US, and build the next generation of data centers.
Using a whole wafer
Before that, though, we must come back to size. Cerebras aims to build a product the size of a semiconductor wafer, which is theoretically big enough for today’s challenges.
“When we began playing with the idea and when we started the company in 2016, we saw the need for a huge amount of compute,” Feldman explained.
Semiconductor chips are made on circular wafers, 300mm (1ft) across. A complex chip can take up 800 square millimeters, and typically chipmakers get around 60 of these from a single wafer. Cerebras needed more than this.
“We thought it was going to be vastly more than what a single 800 square millimeter traditional chip could bring. And that meant you need to tie chips together. There were two ways to do that: An innovative approach, which was our approach, or you could go out and buy a fabric company and think about how to tie them together in very traditional ways.”
Nvidia took the traditional route, buying Mellanox, and using its fabric switch to offer virtual mega chips by tying chips together: “These chips essentially all start on the wafer, and then they’re cut up. And then you buy more equipment to tie them back together again. That’s the elegance, if you keep Humpty Dumpty whole, you don’t have to use glue and all this stuff to tie it back together again.”
Cerebras hopes that its Humpty Dumpty chip is ready for a unique moment in IT hardware. The release of ChatGPT and the resulting generative AI hype wave represents a unique opportunity to define a new generation of hardware, beyond the traditional CPU and GPU markets.
That boom highlighted two things, however: First, that the new market is led by a closed-source company, OpenAI. And second that even Cerebras’ mega chip isn’t big enough for what is to come.
On the first point, Feldman noted that “it’s bad for other hardware vendors if there are a very small number of very powerful software vendors, bad for the ecosystem, bad for innovation, and bad for society.”
Seeing opportunity, Cerebras offered Andromeda to the AI community and was able to quickly release its own generative models – with seven models ranging from 11 million parameters up to 13 billion (GPT-4 is rumored to have more than one trillion).
While the models aren’t able to compete with those of OpenAI, they served a purpose – to show the community that Cerebras’ hardware can be easy to work with, and to show that it can scale.
That’s the other size argument Cerebras makes. The company claims near-perfect linear scaling across multiple CS-2s.
Feldman argues that the large architecture means that it can fit all the parameters of a model in off-chip memory, and split the compute equally among various chips. “As a result, when we put down 16, or 32, or 64, for a customer, we divide the data by that number, send a fraction of the data to this chip, and each of its friends, average the results and it takes about a 16th time or 32nd of the time.
“That’s a characteristic of being able to do all the compute work on one chip – it’s one of the huge benefits of being big.”
Benefits for the host
While the company has focused on being big, its data center host has always benefited from being small.
Colovore is a small operator, with its single facility barely able to fit Cerebras, Lambda Cloud, and others on the site. Launched in 2013, it carved out an equally small market in liquid cooled racks capable of up to 35kW.
“We don’t really think liquid cooling is a niche anymore,” CFO and co-founder Ben Coughlin said. “I think with the adoption of AI, this is becoming a lot more mainstream. And so I think we’re pretty quickly moving out of sort of a small, defined market into something that’s much bigger. We think there’s a huge opportunity.”
While others are still trying to define their liquid cooling strategy, and are getting used to new designs and processes, Colovore has a decade of experience. “If we look at our fellow data center operators, it’s going to be a little bit of a challenge for them to have to pivot or adapt,” Coughlin said. “They have very standard designs that they use, and they’ve used quite successfully, but these are fundamentally different. It’s not so easy to pivot from air to liquid.”
CTO, co-founder, and former Google data center manager Peter Harrison concurred: “[The major colos] perceive the AI revolution, but they feel that this is not really the time to make that investment.
“Part of it is because they have all of this cost in older facilities, and if they admit to the fact that this niche has now become more and more mainstream they run the risk of the markets punishing them by saying that their data centers are obsolete.”
Harrison believes that the hyperscalers aren’t waiting for wholesalers to catch up, and are retrofitting their own facilities and skipping the middlemen. ”
And so when the major players say that they don’t see AI, they may not really be seeing it at all. In reality, because they’re just being ignored.”
A lot of larger colos also target customers with proven revenues, something the new crop of AI startups lack. “Therefore, many startups have difficulties trying to get involved in many of these facilities,” Harrison said.
“The facilities require a long-term contract, large amounts of upfront commitment for capacity,” he added. “A startup may not necessarily know exactly what that may be because they don’t know their demand. So we allow companies to start with one cabinet, and they can ramp up one cabinet at a time.”
This approach, alongside its cooling chops, has got Colovore in the mood for expanding as fast as one of the startups it hosts. The company is starting close to home, recently buying an adjacent building to convert into a 9MW data center. Then it will look further afield.
Coughlin explained: “We have plans to expand and add more capacity both in market and out of market. We’re doing a lot of planning with our customers to figure out where to go.
“It’s our belief, fundamentally, that this high density, high transaction processing capacity needs to be in major metros, because that’s where the data is being generated, managed, stored, and analyzed.”
The company claims to have a standardized data center design for both relatively dry and very humid environments, making most US metros potential sites. “There are a number of underserved markets around the US that we think would need to have these facilities as well,” Harrison added.
“Markets that come to mind would be like Detroit.”
While other companies are still working out their liquid strategy, Coughlin believes that “in the near term we have an opportunity to grow rapidly and broaden our business out.”
It also hopes to be able to stay ahead with the level of its cooling. “When we go direct liquid, you can get designs of up to 300 kilowatts in a single cabinet,” Coughlin said.
For its base configuration, the company offers liquid cooling via rear door heat exchanger, which can support up to 50kW in a cabinet.
“We size the pipes on the front-end to be able to deliver the highest densities, but if a customer comes in and says I only need 10kW in a cabinet, we just don’t provide as much water into that one cabinet. We can control the flow rate to every single cabinet,” Coughlin said.
But for all the company’s experience with liquid cooling, moving beyond its single building would be a huge leap. Perhaps, DCD suggested, the company could work with its investor Digital Realty?
“Moving to this next phase, we’re very, very much open to partnering with Digital, they have footprints in all the markets that we would want to address. “And we’ve talked to them informally about rolling out Colovore as their high density offering,” Coughlin admitted.
Aurora meets Galaxy
As we talk, another informal discussion was nearing completion. Just a few weeks after the visit, Cerebras cloud customer G42 signed a major deal with the chip company, initially to build a huge new supercomputer at the Colovore facility.
The UAE-based AI business – which is controlled by the son of the founder of the state and has been accused of spying on UAE citizens, dissidents, and foreign nationals – turned to Cerebras to build the Condor Galaxy supercomputer. Already deployed, it has 27 million AI compute cores, with two exaflops of single precision AI performance.
Within a few months that supercomputer will double in size. In the first half of 2024, two more will come online in different data centers – one in Austin, Texas, and another in Asheville, North Carolina – and then a further six are planned later in the year. In total, that’s 36 exaflops of single precision performance and 489 million AI cores.
“These guys were looking to build a partnership with a company that could build, manage and operate supercomputers and could implement very large generative AI models, and had expertise in manipulating, cleaning, and managing huge datasets,” Cerebras’ Feldman said of the deal thought to be worth more than $100 million per system.
“There’s a misconception that the only people that could build clusters this size are hyperscalers in the US. That’s clearly wrong. They’re being built all over the world. And there are companies that many people in the US haven’t heard of that have a demand for hundreds of millions of dollars’ worth of AI.”
He added: “It’s extraordinarily exciting, it’s a new phase for the company. As a startup, why you dream big is for customers like this.”
Whether it’s chips or dreams, any talk about Cerebras keeps coming back to size.