Abstract:
This work presents a novel hardware accelerator compatible with <3μm pitch 3D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for …