Achronix is moving ahead at a terrific clip – its 7nm eFPGA core is out today, its 7nm FPGA chip will be out in Q1.
It had revenues of $100 million last year and has 120 employees. It looks like the setting for a liquidity event.
CEO Robert Blake (pictured) agrees that all the makings of a successful IPO are in place but would not be drawn on when, or even if.
That might be because he’s found a winning formula and wants to play it.
The formula can be encapsulated as “the only path left to improve energy-performance-cost is specialisation.”
And what can deliver specialisation better than a base technology which can be tailored to any application?
“It used to be easier to get gains,” says Blake, “now it takes more soul-searching to get improvements.”
Moore’s Law was the old easy route to gains, now gains come from architectures for efficient data acceleration.
More specifically, the best compute performance comes from architectures that target specific applications and data sets.
Achronix claims that its 4th generation eFPGA core, Speedcore 7t, put on the market today, delivers 300% better performance for AI and machine learning applications than the Gen 3 core.
Achronix claims the generational improvement is 60% faster performance, 50% lower power and 65% smaller die size.
The big architectural improvement is adding the Machine Learning Processor (MLP) optimised for efficient large-scale matrix-vector multiplication for AI/ML applications.
“It allows data to be moved in much bigger chunks,” simplifies Blake.
MLP blocks are flexible compute engines tightly coupled with embedded memories to give the highest performance/watt and lowest cost solution for AI/ML applications.
“Achronix was the first company to deliver production eFPGA IP to companies developing SoCs, enabling them to create programmable hardware data accelerators supporting new applications,” says Blake, “the new Speedcore Gen4 eFPGA architecture provides an optimal balance of hardware acceleration previously found only in ASIC implementations and adds the flexibility and reprogrammability of our production-proven FPGA technology to support increasing demand for new AI/ML and high data bandwidth applications.”
Achronix permits customers the ultimate in specialised computing by allowing them to define the core they want in specifying the LUTs, memory, LRAM, MLP and custom blocks.
Achronix describes the background to the development of the Speedcore architecture like this:
‘The dramatic increase in fixed and wireless network bandwidth, coupled with the redistribution of processing, and the emergence of billions of IoT devices will stress traditional network and compute infrastructure.
‘Classic Cloud and Enterprise Data Center computing resources and communications infrastructure can no longer keep pace with exponential growth in data rates, the rapidly changing security protocols, or the many new networking and connectivity requirements.
‘Traditional multicore CPUs and SoCs cannot meet these requirements unaided. They need hardware accelerators, often reprogrammable to pre-process and offload computations to increase the system’s overall compute performance.
‘Speedcore Gen4 is the Optimal AI/ML Accelerator
In addition to the general requirements of compute and networking infrastructure, AI/ML demands a significant increase in high-density, targeted computing.
‘The new Achronix MLP exploits the specific attributes of AI/ML processing and increases performance for these applications by 300% compared to previous Achronix FPGA products.
‘This is done through multiple architectural innovations that increase operating performance and the number of operations per clock cycle.
‘The MLP is a complete AI/ML compute engine. Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data.
‘The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks to deliver the highest processing performance, the highest operations per second and the lowest power profile.
‘The MLPs support multiple precision fixed point and floating point formats including Bfloat16, 16-bit, half-precision floating point, 24-bit floating point and block floating point (BFP). Users can select the optimal precision for their application for performance, power and area.
‘To complement the MLP and increase the AI/ML compute density, Speedcore Gen4 look-up- tables (LUTs) can implement multipliers that are 2x more efficient than any industry standalone or embedded FPGA products.
‘Leading FPGAs implement 6×6 multipliers in 21 LUTs whereas Speedcore Gen4 implements 6×6 multipliers in 11 LUTs and can operate at 1 GHz.
‘Architectural Innovations Increase System Performance
The new Speedcore Gen4 architecture has many architectural innovations that increase overall operating performance by 60% compared to the previous generation of Speedcore products.
‘All aspects of the LUTs have been enhanced to increase area efficiency and reduce resource usage which reduces power and die size and increases performance.
‘Changes include doubling the size of the ALUs, doubling the registers per LUT, support for 7-bit functions and some 8-bit functions in a single level-of-logic delay, and dedicated high-speed connections for shift registers.
‘The routing architecture also has been enhanced with an independent and dedicated bus routing structure that includes dynamically selectable bus muxing that effectively creates a distributed, run-time-configurable switching network.
‘This is the first time that run-time logic functionality is available in the routing structure and it provides an optimal solution for high- bandwidth and low-latency applications.
How to Evaluate Speedcore Gen4
Achronix’s ACE design tools include pre-configured, Speedcore Gen4 eFPGA example instances which users can use to evaluate Speedcore Gen4 quality of results for performance, resource usage, and compile times.
The ACE design tools with support for Speedcore Gen4 are available today.
Speedcore is a modular architecture easily sized to users’ requirements. Achronix uses its Speedcore Builder tool to instantly create new Speedcore instances to match user requirements for quick evaluation.
Users interested in receiving die size and power information can contact Achronix for details on their specific Speedcore Gen4 eFPGA size and process requirements.
Availability
Achronix is using the same methodology to deliver the newest Speedcore Gen4 eFPGA technology to users who want to combine benefits and flexibility of eFPGA IP with the enhanced AI/ML capabilities.
Achronix configures and delivers the Speedcore eFPGA IP and supporting files to users within six weeks for production architectures.
Speedcore Gen4 for TSMC 7nm is available today and will be in production in 1H 2019. Speedcore Gen4 will be in production for TSMC 16nm and 12nm in 2H 2019.