The new Speedcore eFPGA IP with Gen4 architecture dramatically improves upon Achronix’s original and highly successful Speedcore offering. Optimized for high-performance AI/ML and hardware-acceleration applications, Speedcore IP with Gen4 architecture delivers 60% faster performance (300% faster for AI/ML applications) while drawing 50% less power and consuming 65% less die area compared to the previous Speedcore eFPGA generation.
High-Performance FPGA Technology
Figure: Speedcore 7t (Gen4 Architecture) versus Speedcore 16t (Original Speedcore Architecture)
Reconfigurable Logic Blocks (RLBs)
- Logic – 6-input look-up-tables (LUTs) that implement all functions with as many as 7-inputs and some 8-input functions in a single level of logic. Reducing the need for multiple logic levels improves performance.
- 8:1 Muxes – New, dedicated 8-to-1 multiplexers dramatically increase logic performance.
- Shift chain – Double the number of registers compared to the original Speedcore architecture plus optimized routing for shift chains.
- ALU – A larger ALU now supports 8-bit operations for addition, counting, comparison, and maximum functions.
- LUT-based multiplication – Efficient, LUT-based multipliers require half the on-chip resources compared to other leading FPGA products: A 6 × 6 multiply requires only 11 LUTs and runs at 1 GHz. An 8 × 8 multiply requires only 18 LUTs and runs at 500 MHz.
- Dedicated buses – A first in the FPGA industry! High-performance, bus-grouped routing channels, separate from the standard eFPGA routing channels, ensure that there is no congestion between bus-oriented data traffic — common with memories — and other types of data traffic routed over the eFPGA’s standard, bit-oriented channels.
- Bus muxes – Another first in the FPGA industry; bus muxes allow users to efficiently create bus mux functions without consuming any LUTs or standard routing. This capability effectively creates a giant, distributed, run-time-configurable switching network that is separate from the eFPGA’s bit-oriented routing network.
New Machine Learning Processor (MLP) Block for AI/ML
The new MLP in Speedcore eFPGA IP with Gen4 architecture is a complete AI/ML compute engine. Each MLP includes a cyclical register file that leverages temporal locality to reuse stored/cached weights or data, thus boosting performance by significantly reducing data movement for a variety of calculations. The MLPs are tightly coupled with their neighboring MLPs and larger memory blocks to maximize processing performance and to deliver the highest number of operations per second with the lowest power profile. The MLPs support fixed-point and floating-point formats (Bfloat16; 16-bit, half-precision; and block floating point). Users can trade off precision versus performance by selecting the optimal data precision on the fly, as required by each application.
|Configurable multiply precision and count||Trade off performance/power vs. precision – Increasing multiplier count for lower precision functions.|
|Cyclical register file||Double compute performance – Similar to a cache function in that data is saved for efficient reuse by the MLP. Optimized for AI/ML functions.|
|Column bonding and MLP cascade paths||Higher performance – Hard paths between memory and other MLP blocks enable high-performance functionality while freeing up general-purpose routing.|
|Multiple number formats||Flexibility – Supports mainstream fixed- and floating-point formats and frameworks.|
|Rounding and saturation||System performance – Support for multiple rounding formats and saturation that would otherwise need to be implemented in LUTs.|
Production Proven Design Process
Speedcore eFPGA IP with Gen4 architecture follows the same production-proven design process as Achronix’s first-generation Speedcore IP. Designers specify their custom mix of logic, memory, DSP, and MLP blocks to create a unique Speedcore instance that meets die-size, power-consumption, and resource-configuration requirements for their target application(s). Additionally designers can define the IP block aspect ratio and I/O port connections for a Speedcore eFPGA with Gen4 architecture to meet their specific SoC’s design requirements while balancing power against performance for specific applications. Achronix then generates and delivers a GDSII and all the supporting files required to complete integration and timing closure of the customized Speedcore eFPGA instance.
The Speedcore IP block instance can then be integrated directly into an ASIC or SoC using standard EDA design tools and methodology. Along with Speedcore eFPGA IP and supporting files, Achronix also provides a customized, full-featured version of the ACE design tools that designers use to design, verify, and program the functionality of their custom Speedcore eFPGA IP with Gen4 architecture.
Speedcore eFPGA IP with Gen4 architecture is available immediately, supported by the latest version of Achronix’s ACE design tool. This tool includes preconfigured example instances of Speedcore eFPGAs with Gen4 architecture. Users can evaluate performance, resource usage, and compile times for Speedcore eFPGA IP with Gen4 architecture using these example instances, even before developing their own designs.
To receive complete details of Speedcore eFPGA IP with Gen4 architecture, including die size and power consumption, contact Achronix.