An Innovative High-Performance FPGA Family

The Speedster7t FPGA family is optimized for high-bandwidth workloads and eliminates the performance bottlenecks associated with traditional FPGAs. Built on TSMC’s 7nm FinFET process, Speedster7t FPGAs feature a revolutionary new 2D network-on-chip (NoC), an array of new machine learning processors (MLPs) optimized for high-bandwidth and artificial intelligence/machine learning (AI/ML) workloads, high-bandwidth GDDR6 interfaces, 400G Ethernet and PCI Express Gen5 ports — all interconnected to deliver ASIC-level performance while retaining the full programmability of FPGAs.

Network on Chip

While the many terabits of data in today's high-bandwidth applications will easily overwhelm the routing capacity of a conventional FPGA's bit-oriented programmable-interconnect fabric, the Speedster7t architecture includes an innovative, high-bandwidth, two-dimensional NoC that spans horizontally and vertically over the FPGA fabric, connecting to all of the FPGA’s high-speed data and memory interfaces. Acting like a superhighway network running over the FPGA programmable-logic fabric, the Speedster7t NoC supports high-bandwidth communication between interfaces and custom acceleration functions in the programmable-logic fabric.

Each row or column in the NoC is implemented as two 256-bit, unidirectional, industry-standard AXI channels operating at a transfer rate of 2 Gbps.

To learn more, visit Network on Chip.

Machine Learning Processors

Delivering the industry’s highest FPGA-based compute density, Speedster7t FPGAs feature a large array of programmable math compute elements, organized into new machine learning processors (MLP) blocks. Each MLP is a highly configurable, compute-intensive block, with up to 32 multiplier/accumulators (MACs), that support integer formats from 4- to 24-bits and various floating-point modes including native support for Tensorflow’s Bfloat16 format as well as the highly efficient block floating-point format which dramatically increases performance.

These features, and the tight integration of MLP blocks with embedded memory blocks, eliminate the traditional delays associated with FPGA routing, ensuring that machine learning algorithms can be run at the maximum performance of 750 MHz. This combination of high-density compute and high-performance data delivery results in a processor fabric that delivers the highest usable FPGA-based tera-operations (TOps) per second.

To learn more, visit Machine Learning Processor.

High-Performance Interfaces

Critical for high-performance compute and machine learning systems is high off-chip memory bandwidth to source and buffer many high bandwidth data streams.To achieve the needed level of bandwidth, Speedster7t devices include hard GDDR6 memory controllers to support high-bandwidth memory interfaces. With each of the GDDR6 memory controllers capable of supporting 512 Gbps of bandwidth, the up to 8 GDDR6 controllers in a Speedster7t device can support an aggregate GDDR6 bandwidth of 4 Tbps, delivering the equivalent memory bandwidth of an HBM-based FPGA at a fraction of the cost.

Along with this extraordinary memory bandwidth, Speedster7t devices include the industry’s highest performance interface ports to support extremely high-bandwidth data streams. Speedster7t devices have up to 72 of the industry’s highest performing SerDes interfaces that can operate from 1 to 112 Gbps plus hard 400G Ethernet MACs with forward error correction (FEC), supporting 4× 100G and 8× 50G configurations, plus hard PCI Express Gen5 controllers with 8 or 16 lanes per controller.

To learn more, visit Speedster7t Interfaces.

Security Features for Safety Critical and Hardware Assurance Applications

Speedster7t FPGAs confront the threat of third-party attacks with the most advanced bitstream security features with multiple layers of defense for protecting bitstream secrecy and integrity. Keys are encrypted based on a tamper-resistant physically unclonable function (PUF); bitstreams are encrypted and authenticated by 256-bit AES-GCM.

To defend against side-channel attacks, bitstreams are segmented, with separately derived keys used for each segment, and the on-chip decryption hardware employs differential power analysis (DPA) counter measures. Additionally, a 2048-bit RSA public key authentication protocol is used to activate the decryption and authentication hardware. End users can be confident that when they load their secure bitstream, it is the intended configuration because it has been authenticated by an RSA public key, an AES-GCM private key, and a CRC checksum.

Speedster7t Product Table

Features AC7t750 AC7t1500 AC7t3000 AC7t6000
6-input LUTs 363K 692K 1.3M 2.6M
LRAM2k 336 2,560 880 1,760
BRAM72k 1,344 2,560 2,600 5,200
MLP blocks 336 2,560 880 1,760
SerDes 112 Gbps
(LR + XSR)
24 + 16 32 + 0 40 + 32 72 + 0
Dedicated GPIO 32 64 50 100
Additional GPIO 150 150 300 600
DDR4/5 channels 1 1 2 4
GDDR6 8 channels 16 channels † 16 channels 16 channels
PCIe Gen5 One ×16 One ×16 and one ×8 One ×16 and one ×8 Two ×16
Ethernet 8 lanes, 2×400G or 8×100G 16 lanes, 4×400G or 16×100G 16 lanes, 4×400G or 16×100G 32 lanes, 8×400G or 32×100G
† This option varies by package size.

State-of the Art Design Tools

Speedster7t devices are supported by the ACE state-of-the-art FPGA design suite — the result of 200 man-years of development. ACE works in conjunction with industry-standard synthesis tools, allowing FPGA designers to easily map their designs into Achronix solutions. ACE includes an Achronix-optimized version of Synplify-Pro from Synopsys.

Achronix has also partnered with Mentor, a Siemens company, to provide an optimized high-level synthesis (HLS) flow which can target Speedster7t devices. This integrated development environment enables designers to quickly go from C/C++ to RTL using Mentor’s Catapult HLS and Achronix’s ACE design tools.

In order to facilitate artificial intelligence and machine learning applications, Achronix will support standard high-level frameworks such as TensorFlow and Caffe2, as well as providing base hardware libraries to enable developers’ custom tool flows. These libraries include standard convolution, matrix multiplication and transcendental functions.

PCIe Acceleration Platform

Users will be able to benchmark their workloads with a Speedster7t PCIe accelerator card that will be available at the same time as Speedster7t samples. The card is designed for PCIe Gen5 x16, but is also backwards compatible with PCIe Gen4/Gen3 x16. The PCIe card will be a full-height, 3/4 length, dual-width form-factor equipped with a passive heat sink.

The accelerator card will feature QSFP-DD (double density) cages, that support up to two ports of 400G Ethernet or eight ports of 100G Ethernet interfaces using the 56G PAM4-enabled AC7t1500 Speedster7t device, sixteen channels of GDDR6 graphics DRAM for high-bandwidth memory requirements, providing up to 4 Tbps or 512 GBps of memory bandwidth, and a 72-bit DDR4 interface for deeper buffering.

For ease of design and plug-and-play experience, the acceleration platform ships with a complete software development toolkit that includes the ACE design tools, PCIe drivers, libraries, board monitoring utilities, and multiple FPGA example projects. These tools will enable developers to implement their custom solutions and bring differentiated applications to market.

 

Request More Information

Our experts are happy to advise you on how Achronix can help with your toughest design challenges.