Deployments of machine learning networks with auto-regressive critical paths, or recurrence, often poorly utilize AI accelerator hardware. Such networks, like those used in automatic speech recognition (ASR), must run with low latency and deterministic tail-latency for at-scale real-time applications. In this paper, the team at Myrtle.ai presents an overlay architecture for an inference engine which is then implemented on a Speedster7t FPGA. The team further highlights the benefits of the AI-optimized Speedster7t architecture for low-latency, real-time applications.
Version
1.0
Released Date
2023-08-24