Today’s workloads in compute acceleration are as diverse as the end applications — everything from financial trading, genomics through to machine learning inferencing and training. However, the workloads share some common characteristics including the types of arithmetic functions, number formats (integer and floating point), and aggressive performance targets. Furthermore, as processing naturally migrates closer to the edge, power, thermal aspects and performance per watt become key metrics. It is in these areas that FPGAs in general, and the Speedster7t family in particular, excel.

Speedster7t Solution

  • Speedster7t FPGAs provide a high-performance, power efficient computational acceleration solution for defense, financial, medical, scientific, oil and gas, and life science applications:
    • Machine learning (ML) inference and edge training
    • Financial analysis and high-frequency trading
    • Genomic analysis
    • Video and image processing
  • The inherent parallelism and flexibility of the FPGA architecture is well suited to these high-throughput applications.
  • High-speed interfacing is simplified with PCIe Gen5 connectivity and high-performance Ethernet up to 400G, as well as a dedicated 2D network-on-chip (NoC) for high bandwidth data movement.
  • Storage of large data sets is possible with DDR4/5 bulk storage and GDDR6 interfaces for high-bandwidth access to external memory.
  • Data processing supports a wide-variety of number formats from low-bit width integer math to high-performance floating point operations, including native support for matrix multiplications and complex arithmetic (for example, to support beamforming applications).
  • Speedster7t FPGAs are particularly well suited to ML inference and edge analytics operations.
Application Requirements Speedster Value
Need for high bandwidth external connectivity Multiple ports of 400G Ethernet and PCIe Gen5
Highest memory bandwidth for buffering, >1 Tbps Up to 16 independent GDDR6 channels at 16 Gbps offering up to 4 Tbps of total bandwidth
Wide and high-performance datapath Dataflow optimized for compute acceleration matrix vector mathematics

  • Up to 20 Tbps of NoC bandwidth for high-speed, wide-data transfers
  • Optimized bus routing quantized at one byte
  • Fully flexible bit-wise routing
  • Dedicated routing paths to support data reuse between multiply-accumulator and memory
  • Cascade path to enable, for example, systolic array implementation
  • Integrated register file to enable time-multiplexing of calculations
Significant computational requirement for integer arithmetic
  • MLP deliver up to 61 TOps for int8

  • Modified Booth algorithm allows double density of integer multiplies in LUTs

Neural network inferencing requires a large number of matrix multiplications, high-performance computation and significant amounts of data movement Optimized multiply-accumulate core for integer and floating-point arithmetic

  • Truly fracturable integer width: 4x int16 to 16x int8 to 32x int4
  • FP16, bfloat16 and custom floating point support
  • Native support for block floating point

 

Machine Learning
Deep Learning

High Performance Compute Genomics

Video & Image Processing

Highest Performance SerDes
112G multi-Standard SR/MR/LR PHY Yes Yes Yes Yes
Most Advanced Interface IP
PCIe Gen5 Yes Yes Yes Yes
GDDR6 – 4 Tbits/sec of memory bandwidth Yes Yes Yes Yes
DDR4 – up to 3,200 MHz, 3DS stacked memory Yes Yes Yes Yes
DDR5 – up to 4,400 MHz Yes Yes Yes
Application specific interface Yes Yes
Terabit Speed Routing
NoC Yes Yes Yes Yes
Bus routing Yes Yes
Fully flexibility bit wise routing Yes
High-Throughput Processing
Datapath crypto Yes Yes
MLP Yes Yes Yes Yes
Fine grain hardware reprogrammability (examples listed) Format conversion, activation function Monte Carlo analysis PairHMM algorithm Custom codecs