Artificial Intelligence and Machine Learning

Key Trends in AI/ML

Many industries are adopting artificial intelligence (AI) and machine learning (ML) because probabilistic AI/ML algorithms are now able to solve intractable problems not easily solved by any other approach. There is an explosion of digital data including images, videos, and speech, generated by a myriad of sources including social media, internet-of-things, and abundant video and security cameras. This growth is driving the need for analytics to extract knowledge from this data. The associated data analytics often rely on AI/ML algorithms, which have a unique ability to quickly solve problems of high dimensionality that are essentially intractable using classical computer algorithms. Such problems include image recognition, object recognition in one or more video streams, facial recognition, and natural language processing (NLP). AI/ML is proving to be a critical technology for solving many real-time, pattern-recognition problems for many applications including industrial automation, robotics, and autonomous driving.

The core of many AI/ML algorithms is pattern recognition, often implemented as a neural network. AI/ML algorithm developers are widely adopting deep convolutional neural networks (DNNs) because these deep networks offer state-of-the-art accuracy for important image-classification tasks. AI/ML algorithms generally employ matrix and vector math, which requires trillions of multiply/accumulate (MAC) operations per second. Executing these core AI/ML math operations requires many fast multipliers and adders—generally called MAC units.

Mainstream, current-generation DNNs, such as AlexNet or VGG, rely heavily on dense matrix-multiplication operations performed on 32-bit, floating-point (FP32) data. GPUs are well-suited to such operations because they are equipped with many floating-point compute units and high-bandwidth on-chip and off-chip memories. Current, state-of-the-art GPUs are widely used for accelerating DNNs because they offer high performance for mainstream DNNs, albeit with high power consumption.

Why FPGAs for AI/ML?

FPGAs provide superior energy efficiency (performance/watt) relative to GPUs in AI/ML applications while offering comparable or superior performance for inferencing operations because they can deliver at least as many compute elements, and in some cases significantly more. Consequently, FPGAs are excellent implementation vehicles for AI/ML algorithms. Further, while GPUs are programmable, FPGAs are both programmable and reconfigurable, giving them an advantage over GPUs when implementing newly developed AI/ML algorithms.

Before data can be processed through all of the matrix and vector math, it must first make its way into the chip and all the way to the AI/ML computational core. This data must traverse high-speed I/O ports; a high-performance, on-chip I/O infrastructure; and a memory hierarchy that becomes progressively faster as it marshals and shepherds the raw data to the computational core and conveys the computed results out to other blocks in the system. Any AI/ML solution must supply high-speed I/O and a suitable memory hierarchy in addition to fast computational resources.

Achronix Speedster7t FPGAs for AI/ML Applications

Achronix's Speedster7t FPGAs were designed for AI/ML application acceleration.  Built with high-performance SerDes, high-bandwidth memory interfaces, dedicated machine learning processors and high-speed PCIe Gen5 ports, the Speedster7t FPGA family can handle the most demanding workloads.  Speedster7t FPGAs were designed to address key design challenges including: 

  • Ingesting massive amounts of data from multiple high-speed input sources
  • Storing and retrieving this input data, along with the DNN models, partial results from each layer computation, and completed computations
  • Rapid distribution of this data to on-chip resources that can perform the layer computations quickly
  • Output of computed results in a high-speed fashion

Speedster7t Solution

Application Requirement Speedster7t Solution
High-speed I/O interfaces
  • 400G Ethernet ports
  • PCIe Gen5
  • 112 Gbps SerDes
High-speed memory access
  • 4 Tbps GDDR6 aggregate bandwidth
  • 256 Gbps DDR4 bandwidth
Fast data movement with FPGA 2D network on chip (NoC) which provides up to 20 Tbps bandwidth for high-speed, wide-data transfers
High-speed and flexible mathematical operation support

Machine learning processors:

  • Fully fracturable integer multiplier/accumulator
  • Flexible floating point
  • Native support for block floating point
  • Efficient matrix multiplication

Learn More

Speedster7t Devices

Machine Learning Processors