Hardware

A 12 b 180-MS/s Pipelined-SAR ADC With a Low-Power Calibration-Free RA and Hardware-Efficient SAR Logic

A 12 b 180-MS/s Pipelined-SAR ADC With a Low-Power Calibration-Free RA and Hardware-Efficient SAR Logic 150 150

Abstract:

This letter presents a 12-bit, 180-MS/s pipelined-SAR ADC in 65-nm CMOS. To eliminate the complex interstage gain-error calibration for a fast-response characteristic, a high-gain residue amplifier (RA) featuring a two-stage gain-boosting architecture is proposed. By removing the tail current, the RA significantly alleviates slew-rate and voltage headroom limitations. The …

View on IEEE Xplore

Leveraging a Passive MRAM Crossbar for Hardware-in-the-Loop and Continual learning

Leveraging a Passive MRAM Crossbar for Hardware-in-the-Loop and Continual learning 150 150

Abstract:

Artificial neural networks (ANNs) have enabled major advances in artificial intelligence, yet their growing computational and energy demands challenge conventional von Neumann architectures due to the costly separation of memory and processing. In-memory computing has emerged as a promising solution, particularly through memristive crossbar arrays capable of performing multiply-and-accumulate operations …

View on IEEE Xplore

STAR-SRAM: 16-bit Floating-Point SRAM-Based Digital Computing-in-Memory Macro in a 28 nm

STAR-SRAM: 16-bit Floating-Point SRAM-Based Digital Computing-in-Memory Macro in a 28 nm 150 150

Abstract:

A digital computing-in-memory (DCIM) macro emerges as a promising building block in a deep neural network (DNN) accelerator. To better support DNN workloads, circuit designers aim to improve three main metrics for macros: energy efficiency, compute density, and weight density. Improvements in those metrics directly translate into reduced energy consumption, …

View on IEEE Xplore

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision

ASAP: A 28-nm Transformer Training Accelerator With Alternating Sparsity and Asymmetrical Microscaling Precision 150 150

Abstract:

This work presents ASAP, a 28-nm transformer-training accelerator that combines N:M structured sparsity with asymmetric microscaling floating-point (MXFP) precision through a unified algorithm–hardware co-design. ASAP introduces a progressive sparsity schedule in which pruned compute resources are reassigned to increase numerical precision for important weights and activations, stabilizing optimization …

View on IEEE Xplore

A 11.0-TOPS/W Diffusion Accelerator With Temporal Data Reuse for Real-Time Text-to-Motion Generation

A 11.0-TOPS/W Diffusion Accelerator With Temporal Data Reuse for Real-Time Text-to-Motion Generation 150 150

Abstract:

Text-to-motion models are AI systems that generate human motion sequences directly from natural language descriptions, serving as key enablers for immersive virtual avatars and interactive digital humans in AR/VR ecosystems. However, state-of-the-art text-to-motion diffusion models suffer from substantial computational costs due to their iterative nature, making them ill-suited for …

View on IEEE Xplore

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency 150 150

Abstract:

The proliferation of large language models (LLMs) as cross-domain foundation models is fueled by aggressive scaling in both parameter counts and inference-time computation. The emergence of sophisticated reasoning models further accelerates this trend, demanding longer context windows and escalating the computational and memory burdens of inference. A fundamental challenge arises …

View on IEEE Xplore

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration 150 150

Abstract:

We propose an edge multioperator computing-in-memory (EMO-CIM) design that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise (DW)-Convolution, and Attention operators. It features: 1) a single EMO-CIM bank (ECB) excels in variable vector-wise MAC (V-MAC) for multioperators; 2) merging local input-shared compute units (LISCUs) with a decode-unit and adder-tree (DUAT) facilitates …

View on IEEE Xplore

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer 150 150

Abstract:

This article introduces a hybrid Slepian beamforming receiver architecture with low power and area costs. Traditional large-scale true-time-delay (TTD) beamformers for wideband wireless communication suffer from high power consumption and high hardware costs. As an alternative, the Slepian beamforming approach reduces the number of analog-to-digital conversions (ADCs) and delays for …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore