Compute Acceleration

Today's workloads in compute acceleration are as diverse as the end applications — everything from financial trading and genomics to machine learning inference and training. However, the workloads share some common characteristics including the types of arithmetic functions, number formats (integer and floating point), and aggressive performance targets. Furthermore, as processing naturally migrates closer to the edge, power, thermal aspects and performance per watt become key metrics. It is in these areas that FPGAs in general, and the Speedster7t family in particular, excel.

The Speedster7t FPGA family is optimized for high-bandwidth workloads and eliminates the performance bottlenecks associated with traditional FPGAs. Built on the TSMC 7nm FinFET process, Speedster7t FPGAs feature a revolutionary new 2D network-on-chip (2D NoC), an array of new machine learning processors (MLPs) optimized for high-bandwidth and artificial intelligence/machine learning (AI/ML) workloads, high-bandwidth GDDR6 interfaces, 400G Ethernet and PCI Express Gen5 ports. The 2D NoC connects all of the interfaces to over 80 access points in the FPGA fabric to deliver ASIC-level performance while retaining the full programmability of FPGAs. Get started today with the VectorPath accelerator card, featuring the Speedster7t FPGA.

Speedster7t Solution

  • Speedster7t FPGAs provide a high-performance, power efficient computational acceleration solution for defense, financial, medical, scientific, oil and gas, and life science applications:
    • Machine learning (ML) inference and edge training
    • Financial analysis and high-frequency trading
    • Genomic analysis
    • Video and image processing
  • The inherent parallelism and flexibility of the FPGA architecture is well suited to these high-throughput applications. 
  • High-speed interfacing with PCIe Gen5 connectivity and high-performance Ethernet, as well as a dedicated 2D network-on-chip (NoC) for high bandwidth data movement.
  • Storage of large data sets is possible with DDR4/5 bulk storage and GDDR6 interfaces for high-bandwidth access to external memory.
  • Data processing supports a wide-variety of number formats from low-bit width integer math to high-performance floating point operations, including native support for matrix multiplications and complex arithmetic (for example, to support beamforming applications).
  • Speedster7t FPGAs are particularly well suited to ML inference and edge analytics operations.

Application Requirements Speedster Value
Need for high bandwidth external connectivity Multiple ports of 400G Ethernet and PCIe Gen5
Highest memory bandwidth for buffering, >1 Tbps Up to 16 independent GDDR6 channels at 16 Gbps offering up to 4 Tbps of total bandwidth
Wide and high-performance datapath

Dataflow optimized for compute acceleration matrix vector mathematics

  • Up to 20 Tbps of NoC bandwidth for high-speed, wide-data transfers
  • Optimized bus routing quantized at one byte
  • Fully flexible bit-wise routing
  • Dedicated routing paths to support data reuse between multiply-accumulator and memory
  • Cascade path to enable, for example, systolic array implementation
  • Integrated register file to enable time-multiplexing of calculations
Significant computational requirement for integer arithmetic
  • MLP deliver up to 61 TOps for int8

  • Modified Booth algorithm allows double density of integer multiplies in LUTs

Neural network inferencing requires a large number of matrix multiplications, high-performance computation and significant amounts of data movement

Optimized multiply-accumulate core for integer and floating-point arithmetic

  • Truly fracturable integer width: 4x int16 to 16x int8 to 32x int4
  • FP16, bfloat16 and custom floating point support
  • Native support for block floating point

 

  Machine Learning
Deep Learning
High Performance Compute Genomics Video & Image Processing
Highest Performance SerDes
112G multi-Standard SR/MR/LR PHY Yes Yes Yes Yes
Most Advanced Interface IP
PCIe Gen5 Yes Yes Yes Yes
GDDR6 - 4 Tbits/sec of memory bandwidth Yes Yes Yes Yes
DDR4 - up to 3,200 MHz, 3DS stacked memory Yes Yes Yes Yes
DDR5 - up to 4,400 MHz Yes Yes Yes  
Application specific interface     Yes Yes
Terabit Speed Routing
NoC Yes Yes Yes Yes
Bus routing Yes     Yes
Fully flexibility bit wise routing Yes      
High-Throughput Processing
Datapath crypto Yes     Yes
MLP Yes Yes Yes Yes
Fine grain hardware reprogrammability (examples listed) Format conversion, activation function Monte Carlo analysis PairHMM algorithm Custom codecs