Attention mechanisms

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This work presents a novel hardware accelerator compatible with <3μm pitch 3D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for …

View on IEEE Xplore