Accuracy

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme 150 150

Abstract:

This article presents DPe-CIM, a 4T-1C dual-port embedded dynamic random access memory (eDRAM)-based compute-in-memory (CIM) macro with adaptive refresh and data conversion reduction. DPe-CIM proposes four key features that improve area and energy efficiency: 1) dual-port eDRAM cell (DPC) separates the multiply-and-accumulate (MAC) and refresh ports, enabling simultaneous MAC …

View on IEEE Xplore

A 560 μ W, 6 fA/√Hz, 146 dB-DR Ultrasensitive Current Readout Circuit for PWM-Dimming-Tolerant Under-Display Ambient Light Sensors

A 560 μ W, 6 fA/√Hz, 146 dB-DR Ultrasensitive Current Readout Circuit for PWM-Dimming-Tolerant Under-Display Ambient Light Sensors 150 150

Abstract:

This letter presents an ultralow-noise, power-efficient, and pulse-width modulation (PWM)-dimming-tolerant photocurrent readout circuit for under-display ambient light sensor (ALS). A transimpedance amplifier (TIA) with a feedback diode achieves G $\Omega $ -level resistance and 6 fA/ $\surd $ Hz input current noise, enabling sub-pA resolution. Instability and noise folding are mitigated at …

View on IEEE Xplore

A Compact, Highly-Digital Sensor-Fusion-Based Joint V dd-Temperature Sensor for SoC Thermal Management

A Compact, Highly-Digital Sensor-Fusion-Based Joint V dd-Temperature Sensor for SoC Thermal Management 150 150

Abstract:

This article presents a fine-grained thermal sensing network for thermal management in SoCs. Sensor nodes in this network are made up of joint supply voltage ( $V_{\mathrm {dd}}$ ) and temperature ( $T$ ) sensors, which are compact and highly digital. Measurements from these simple but imperfect sensors are jointly processed to extract …

View on IEEE Xplore

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing 150 150

Abstract:

Simultaneous localization and mapping (SLAM) provides crucial ego-pose information and 3-D maps of the user environment, which are fundamental to emerging mobile spatial computing devices. Dense 3-D mapping and accurate pose estimation are particularly necessary for applications like augmented reality (AR) and autonomous navigation. However, existing SLAM processors are typically …

View on IEEE Xplore

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device 150 150

Abstract:

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first …

View on IEEE Xplore

1.58-b FeFET-Based Ternary Neural Networks: Achieving Robust Compute-In-Memory With Weight-Input Transformations

1.58-b FeFET-Based Ternary Neural Networks: Achieving Robust Compute-In-Memory With Weight-Input Transformations 150 150

Abstract:

Ternary weight neural networks (TWNs), with weights quantized to three states (−1, 0, and 1), have emerged as promising solutions for resource-constrained edge artificial intelligence (AI) platforms due to their high energy efficiency with acceptable inference accuracy. Further energy savings can be achieved with TWN accelerators utilizing techniques such as compute-in-memory (CiM) and …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

A Scalable 1024-Channel Ultra-Low-Power Spike Sorting Chip With Event-Driven Detection and Spatial Clustering

A Scalable 1024-Channel Ultra-Low-Power Spike Sorting Chip With Event-Driven Detection and Spatial Clustering 150 150

Abstract:

This article presents a 1024-channel ultra-low-power spike sorting chip featuring event-driven spike detection and spatial clustering for large-scale neural recording. To address power and scalability constraints in brain–computer interfaces (BCIs), the design integrates a compressive analog-to-digital converter (ADC) with a two-stage spike detector that significantly reduces memory and processing …

View on IEEE Xplore