NVIDIA Blackwell and the Difference Between Energy Efficiency and Power

NVIDIA’s new Blackwell GPU is an amazing offering. It’s similar in some ways to AMD’s Threadripper in that both parts push the envelope of performance, and both, due to the amount of power they require, would be difficult to get approved based on energy consumption. When Threadripper came out, it was positioned at the gaming market, but Lenovo saw the opportunity it represented in the workstation market, brought out a line of Threadripper workstations and took market leadership in workstations. Engineers have always favored absolute power and performance. To them, energy use isn’t as important.

But Blackwell is a server part, and when it comes to servers, energy efficiency is an extremely high priority. While Blackwell uses a ton of it, the performance of the part actually makes it more energy efficient, not less. 

Let’s talk about how to look at energy efficiency vs. power use this week. 

How More Power Can Be More Efficient

Let’s take the example of a tractor-trailer rig compared to a Toyota Prius. In terms of miles per gallon, the tractor-trailer is far less efficient than a Prius. But the reason we ship in tractor-trailers and not a lot of Toyota Prius cars is because the tractor-trailer cost-per-package is vastly more efficient for hauling goods than the low-capacity Prius. 

This is the case with Blackwell. While it uses far more energy over time than more traditional GPUs, it also does more jobs more quickly, so the energy use per job is much lower. As with the tractor-trailer example, it’s more energy efficient on a per task basis than the alternatives. 

This means you need fewer Blackwell-based servers to manage the same workload even if the jobs are being completed more quickly.

The Data Center Problem

Data centers are typically built for existing loads, heat dissipation and energy use, with some headroom for growth. When adding significantly more capacity, you have to add additional data centers, build out larger data centers, and significantly increase energy supplies. This is because you need to increase both the footprint of the servers and the amount of energy available to run them.

But when you get a breakthrough like Blackwell, you can put this additional power in the same physical footprint as the old technology. You may still need to upgrade the cooling systems and power supply to the data center so it can handle the extra load, but you don’t necessarily have to change the footprint. By going to a solution like warm-water cooling, something that Lenovo does aggressively, the cost of the conversion can be significantly reduced, providing a far higher performance yield at a far lower build cost than using older technologies.   

Wrapping Up:

While there have been concerns over time about parts like NVIDIA’s Blackwell which pull far more power than their predecessors, they provide a path to growth that is cheaper than just expanding the data center with more servers because the cost of the upgrade doesn’t require substantial increases in footprint with the additional build cost.

While you may need to upgrade cooling and power systems, you can move to alternative cooling technologies like warm-water cooling to keep those costs down and still gain the workload and speed benefits the technology promises. 

In the end, it is less important how much power a new part uses than how much work it can do for a given amount of power. NVIDIA’s Blackwell showcases that exact benefit.