sparsity

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision 150 150

Abstract:

This work presents ASAP, a 28-nm transformer-training accelerator that combines N:M structured sparsity with asymmetric microscaling floating-point (MXFP) precision through a unified algorithm–hardware co-design. ASAP introduces a progressive sparsity schedule in which pruned compute resources are reassigned to increase numerical precision for important weights and activations, stabilizing optimization …

View on IEEE Xplore

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management 150 150

Abstract:

This article presents MITTA, the first silicon-proven transformer accelerator optimized for multi-task inference across both natural language processing (NLP) and image processing domains. MITTA accelerates a task-sharing algorithm that minimizes sub-task computation by reusing both activations and weights from a shared base task, requiring only sparse delta computation for sub-tasks. …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications 150 150

Abstract:

Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous …

View on IEEE Xplore