A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 https://sscs.ieee.org/wp-content/themes/movedo/images/empty/thumbnail.jpg 150 150 https://secure.gravatar.com/avatar/8fcdccb598784519a6037b6f80b02dee03caa773fc8d223c13bfce179d70f915?s=96&d=mm&r=g
Abstract:
This work presents a novel hardware accelerator compatible with <3μm pitch 3D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for …