According to The Information, these issues have reportedly led to design changes, which means delays in shipping products and raises concern that its biggest customers, including Google, Meta, and Microsoft, will not be able to deploy Blackwell servers according to their schedules.
Nvidia’s Blackwell GPUs overheat in ultra-dense servers with 72 processors. Each Blackwell processor draws more than 1000 W of power, so that’s a whole lot of heat and power in a relatively small space.
Nvidia said that it is working closely with suppliers and partners to develop revisions and make design changes to address the overheating issues. Such redesigns are not uncommon, but in this case, it is pushing back the expected ship date, which was supposed to be this quarter.
These are not the first rumours to plague Blackwell. In August, word came out that Nvidia and its manufacturing partner TSMC were dealing with yield issues due to the processor's packaging design. But that was quickly addressed and dismissed on the quarterly earnings call.
Nvidia reports earnings on Wednesday, November 20, after the close of trading on the stock market. For now, a company spokesperson said this:
“Nvidia GB200 systems are the most advanced computers ever created. Integrating them into diverse data centre environments requires co-engineering with our customers. Our engineering iterations are in line with expectations. Some of our partners, including Dell and CoreWeave, are promoting new Nvidia GB200 NVL72 designs here at SC and on social media.”
Moor Insights & Strategies principal analyst Anshel Sag told Network World it was early to tell if this is a widespread or configuration problem.
“I can’t imagine that Nvidia would ship a part that overheats, especially with the amount of cooling that’s already necessary,” he said.
He also thought that the news’ timing is suspect as the Supercomputing 24 conference is taking place, and he wouldn’t put it past a Nvidia competitor to try and kneecap the company.
“Supercomputing is when everyone who’s anyone in the HPC world is meeting up and talking rumours and shop, and today would be the day to drop a big rumour like this to get it to spread across the industry like wildfire,” he said.
“If it were more organic, it would’ve spread after the show as people talked privately and gossiped. This almost feels like a leak the competition would spread to get more eyeballs on the competitive platforms.”