A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 https://sscs.ieee.org/wp-content/themes/movedo/images/empty/thumbnail.jpg 150 150 https://secure.gravatar.com/avatar/8fcdccb598784519a6037b6f80b02dee03caa773fc8d223c13bfce179d70f915?s=96&d=mm&r=g
Abstract:
This letter presents a novel hardware accelerator compatible with <3- $\mu $ m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations …