A closer look at Nvidia’s Kepler GPU

Nvidia recently debuted its long-awaited Kepler GPU lineup, which is based on 28-nanometer (nm) process technology and succeeds the 40-nm Fermi.



As you may recall, Kepler is Nvidia’s first new graphics architecture in several years. As the successor to Fermi, it is likely to form the basis of graphics products for the next few years on TSMC’s 28nm process. 

Indeed, according to Silicon Valley analyst David Kanter, the GTX 680 is the first real implementation of Kepler and demonstrates excellent results.

“The aggregate single precision shader performance is 3TFLOP/s at the base frequency, about 2.4× faster than the GTX 560. Perhaps the most encouraging part of the story is that Kepler appears to be remarkably area and power efficient,” Kanter wrote in an analysis posted on RealWorld Tech.

“The GTX 680 is a 195W TDP card and the GPU packs in 3.54B transistors in 294mm2. Previous generations from Nvidia were quite inefficient, likely due to the focus on general purpose computational workloads. [However], the first Kepler products significantly improve the GFLOP/S/W and GFLOP/S/mm2 beyond simply process technology scaling, which bodes well for the architecture.”



As an added bonus, says Kanter, Nvidia’s memory interfaces seem to have finally matured, as memory was a persistent weakness in every Nvidia GPU and a significant competitive disadvantage over the past 4-5 years.

Nevertheless, the GTX 680’s GDDR5 memory has caught up with and even exceeded AMD by reaching 6GT/s. 

Kanter also opined that Nvidia’s Kepler core remains a “poor fit” for non-consumer compute applications, as the excellent efficiency for graphics has undoubtedly come at the cost of general purpose workloads.


“Nvidia’s architects made a conscious choice to quadruple the FLOPs for each core, but only double the bandwidth for shared data,” Kanter explained.



“The result? The older Fermi generation is substantially better suited to general purpose workloads and will continue to be preferred for many applications.”



Simply put, notes Kanter, it is highly likely that Nvidia’s upcoming compute products will use a core that is tuned for general purpose workloads. Such a revamped core would be a derivative of Kepler, designed to re-use as much of the engineering effort as possible, but with several significant changes.



Kanter concluded his analysis by emphasizing that Kepler represents a “tremendous milestone” for Nvidia.

“It eliminates the efficiency flaws that were found in previous generations, demonstrates good memory bandwidth, the start of a DVFS strategy and robust execution at 28nm. The graphics performance is excellent, with no compromises, and attractive power and area efficiency. This success stems from tuning the Kepler core almost exclusively for graphics.

“Going forward, it appears that Nvidia’s strategy will rely on two divergent designs; one specialized for graphics and the other for compute workloads. In the next few months, it should become apparent how this will play out, but for now the graphics side looks good,” he added.