Nvidia showcases new Kepler-powered Tesla GPUs

Nvidia is showcasing two new Kepler-powered Tesla GPUs at its annual GPU Technology Conference in San Jose, California.

According to Nvidia CEO Jen-Hsun Huang, the Tesla K10 and K20 GPUs were engineered to handle the “most complex HPC problems” in the world. 

Designed with a focus on high performance and extreme power efficiency, Kepler is three times as efficient as its Fermi predecessor, which successfully created a new standard for parallel computing when introduced two years ago.

“Fermi was a major step forward in computing. It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform,” Nvidia chief scientist Bill Dally explained.

“Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency.”

Nvidia’s Tesla K10 GPU is optimized for the energy exploration market and the defense industry – targeting signal, image and seismic processing applications. A single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

Meanwhile, Nvidia dubbed th K20 GPU the “new Tesla flagship,” as it delivers three times more double precision compared to Fermi architecture-based Tesla cards. 
The GK110 GPU is slated to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

Additional K10 and K20 features include:

SMX Streaming Multiprocessor – The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX’s energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.

Dynamic Parallelism – This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It signficantly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.

Hyper-Q – This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU – dramatically increaseing GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.