Tilera has launched an advanced line of TILE-Gx processors that reportedly offer “ten times better” compute efficiency than Intel’s next-generation Westmere chip.
The processors – fabricated in TSMC’s 40 nanometer process – operate at up to 1.50 GHz with power consumption ranging from 10 to 55 watts.
“The TILE-Gx line, available with 16, 36, 64 and 100 cores, employs Tilera’s unique architecture that scales well beyond the core count of traditional microprocessors,” Tilera spokesperon Bob Doud told TG Daily. “Tilera’s two-dimensional iMesh interconnect eliminates the need for an on-chip bus and its Dynamic Distributed Cache (DDC) system allows each cores’ local cache to be shared coherently across the entire chip.”
Doud explained that the two “key” technologies enabled TILE architecture performance to scale nearly linearly with the number of cores on the chip.
“Tilera is four years ahead of everyone else in the chip world. Unlike Intel, we were able to begin our chip design with a clean slate. Intel is weighed down by a certain amount of baggage, for example, an immense investment in X86 architecture. They have to ensure backwards compatibility for a number of server and consumer products,” stated Doud.
“But our view is that increasing performance is not just about driving up clock frequency. Although higher clock frequency does lead to higher speeds, the power draw goes up exponentially as a result. Our belief is that the performance should be raised via parallelism – or many cores running at modest clock speeds.”
Advancing compute capabilities
According to Doud, Tilera’s many-core model has already advanced compute capabilities in a number of critical areas, such as:
- Consolidation of compute – A single, many-core processor is capable of absorbing functions that previously required multiple processors.
- Granularity of compute – Processing resources can be allocated to functions in precise increments, optimizing performance and saving power.
- Deterministic compute – Enables processor cores to be dedicated to specific tasks, including cache-coherent islands of compute for highly predictable performance.
Adoption and target markets
Doud emphasized that the TILE-Gx processor family was “ideal” for a wide range of applications, including enterprise networking, cloud computing, multimedia and wireless infrastructure.
“Cloud computing is a very broad term and it is obviously a huge market. We certainly expected to broaden our foothold and market share over time. We offer incredible compute efficiency, about 10x more than Intel’s Nehalem processor. Tilera also delivers significantly better performance per watt than Sun’s SPARC architecture.”
He added that the world was “shifting away” from software that exploited “specific” hardware.
“This paradigm shift is a good thing for us, as it opens up a myriad of opportunities in terms of Tilera adoption. For example, there is a definite move away from Wintel specific platforms to a more generic Linux base. Hardware is a very level playing field, one where we shine through in terms of performance and power consumption.”
Finally, Doud noted that the “fundamental OS” for TILE-Gx processors was a Linux distribution known as Zol Linux, or zero overhead Linux.
“Our processors are extremely standard in terms of Linux, which has been successfully ported to dozens of platforms. And once you have Linux running, there a number of tools that offer support for C++ and Java,” explained Doud. “Tilera provides a multicore development environment for its TILE-Gx processors that includes a GCC compiler, standard gdb gprof, Eclipse IDE, multicore debug and multicore profile. In addition, our standard application stack offers a bare metal environment, hypervisor layer, virtualization capabilities, I/O devices drivers and a load balancer.”
The TILE-Gx36 processor will be sampling in Q4 of 2010 with the other processors rolling out in the following two quarters. Low volume pricing will range from under $400 for the Gx36 to less than $1,000 for the Gx100.
Additional Tilera TILE-Gx processor specs
- Next-generation 64-bit core: New three-issue 64-bit core with full virtual memory system. Each core includes 32KB L1 I-cache, 32KB L1 D-cache and 256KB L2 cache, with up to 26MB total L3 coherent cache across the device.
- Enhanced SIMD instruction extensions: Enhanced signal processing performance with a 4 MAC/cycle multiplier unit delivering up to 600 billion MACs per second, more than 12x the fastest commercial DSP.
- Integrated high-performance DDR3 memory controllers: Two or four 72-bit controllers running up to 2133 MHz speeds with ECC support. Up to 1TB total capacity and supporting memory striping modes for maximum utilization.
- Hardware acceleration engines: On-chip MiCA (Multistream iMesh Crypto Accelerator) system delivers up to 40Gbps encryption and 20Gbps full duplex compression processing, tightly coupled to the core array for extremely low latency and wire-speed small packet throughput. A high-performance true random number generator (RNG) and public key accelerator enable up to 50,000 RSA handshakes per second.
- Packet processing accelerator: mPIPE (multicore Programmable Intelligent Packet Engine) system provides wire-speed packet classification, load balancing and buffer management. This flexible, C-programmable engine delivers 80 Gbps and 120 million packets-per-second of throughput for packets with multiple layers of encapsulation.