Accuracy

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

Leveraging a passive MRAM crossbar for hardware-in-the-loop and continual learning

Leveraging a passive MRAM crossbar for hardware-in-the-loop and continual learning 150 150

Abstract:

Artificial neural networks have enabled major advances in artificial intelligence, yet their growing computational and energy demands challenge conventional von Neumann architectures due to the costly separation of memory and processing. In-memory computing has emerged as a promising solution, particularly through memristive crossbar arrays capable of performing multiply-and-accumulate operations directly …

View on IEEE Xplore

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision 150 150

Abstract:

This work presents ASAP, a 28-nm transformer-training accelerator that combines N:M structured sparsity with asymmetric microscaling floating-point (MXFP) precision through a unified algorithm–hardware co-design. ASAP introduces a progressive sparsity schedule in which pruned compute resources are reassigned to increase numerical precision for important weights and activations, stabilizing optimization …

View on IEEE Xplore

A BEV Perception Transformer Accelerator With Saliency-Driven Image/Point Cloud Fusion and Phase-Linked Dataflow in 28 nm CMOS

A BEV Perception Transformer Accelerator With Saliency-Driven Image/Point Cloud Fusion and Phase-Linked Dataflow in 28 nm CMOS 150 150

Abstract:

Deploying advanced Transformer-based models for real-time, high-accuracy multimodal bird’s-eye-view (BEV) perception in autonomous driving imposes substantial hardware demands. To address this, we propose a low-cost, low-power image/point-cloud fusion Transformer accelerator that supports two modes: high-performance driving and ultra-low-power sentry operation. We first propose a cross-modal saliency evaluation mechanism …

View on IEEE Xplore

A 16 MHz RC Frequency Reference With ±450 ppm Inaccuracy From –45 °C to 85 °C After Accelerated Aging

A 16 MHz RC Frequency Reference With ±450 ppm Inaccuracy From –45 °C to 85 °C After Accelerated Aging 150 150

Abstract:

This article presents a high-accuracy, low-drift 16MHz RC frequency reference implemented in a standard 180 nm CMOS process. It consists of a frequency-locked loop (FLL), which locks the output frequency of a digitally controlled oscillator (DCO) to the time constant of a Wien Bridge (WB) filter. A PNP-based temperature sensor (TS) …

View on IEEE Xplore

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs 150 150

Abstract:

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices …

View on IEEE Xplore

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management 150 150

Abstract:

This article presents MITTA, the first silicon-proven transformer accelerator optimized for multi-task inference across both natural language processing (NLP) and image processing domains. MITTA accelerates a task-sharing algorithm that minimizes sub-task computation by reusing both activations and weights from a shared base task, requiring only sparse delta computation for sub-tasks. …

View on IEEE Xplore

A 7.5-μW 35-Keyword End-to-End Keyword Spotting System With Random Augmented On-Chip Training

A 7.5-μW 35-Keyword End-to-End Keyword Spotting System With Random Augmented On-Chip Training 150 150

Abstract:

Fully integrated keyword spotting (KWS) systems designed for low-power operation face two major challenges. First, increasing the number of supported keywords significantly raises system complexity and power consumption. Second, most existing systems are not personalized to individual users, as they are trained on data from native English speakers, leading to …

View on IEEE Xplore

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference 150 150

Abstract:

This letter presents a switched-capacitor SRAM compute-in-memory macro optimized for TinyML inference. Key features include: 1) an area-efficient folded-differential multiply-and-accumulate (FD-MAC) scheme to double the signal margin; 2) a closed-loop floating-inverter amplifier (FIA)-based charge accumulation technique for signal-to-noise ratio enhancement and multiply-and-accumulate (MAC) voltage integration; and 3) a sparsity-aware multistep MAC method …

View on IEEE Xplore