Hardware

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

A Calibration-Free Pipelined-SAR ADC With Cross-Stage Gain-Mismatch Error Shaping and Inherent Noise Shaping

A Calibration-Free Pipelined-SAR ADC With Cross-Stage Gain-Mismatch Error Shaping and Inherent Noise Shaping 150 150

Abstract:

This article presents a calibration-free pipelined-successive-approximation-register (SAR) analog-to-digital converter (ADC) based on the proposed cross-stage gain-mismatch-error shaping (CS-GMES) mechanism. The CS-GMES is realized by including the entire 2nd stage into MES operation to unify the gain error and the 2nd-stage mismatch error. A feedback capacitor provides cross-stage connection and mismatch …

View on IEEE Xplore

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing 150 150

Abstract:

Simultaneous localization and mapping (SLAM) provides crucial ego-pose information and 3-D maps of the user environment, which are fundamental to emerging mobile spatial computing devices. Dense 3-D mapping and accurate pose estimation are particularly necessary for applications like augmented reality (AR) and autonomous navigation. However, existing SLAM processors are typically …

View on IEEE Xplore

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device 150 150

Abstract:

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first …

View on IEEE Xplore

1.58-b FeFET-Based Ternary Neural Networks: Achieving Robust Compute-In-Memory With Weight-Input Transformations

1.58-b FeFET-Based Ternary Neural Networks: Achieving Robust Compute-In-Memory With Weight-Input Transformations 150 150

Abstract:

Ternary weight neural networks (TWNs), with weights quantized to three states (−1, 0, and 1), have emerged as promising solutions for resource-constrained edge artificial intelligence (AI) platforms due to their high energy efficiency with acceptable inference accuracy. Further energy savings can be achieved with TWN accelerators utilizing techniques such as compute-in-memory (CiM) and …

View on IEEE Xplore

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications 150 150

Abstract:

Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous …

View on IEEE Xplore

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement 150 150

Abstract:

The extreme gradient boosting (XGBoost) has emerged as a powerful AI algorithm, achieving high accuracy and winning multiple Kaggle competitions in various tasks including medical diagnosis, recommendation systems, and autonomous driving. It has great potential for running on edge devices due to its binary tree-based simple computing kernel, offering unique …

View on IEEE Xplore

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping 150 150

Abstract:

Zero-skipping is a famous technique to improve the energy efficiency of deep neural network (DNN) accelerators. When the zero-skipping is realized with encoded data using lossless compression, irregular and unpredictable size of data due to inconsistent compression rate incurs several design issues including: 1) load imbalance from irregularity of data stored …

View on IEEE Xplore

FIMA: A Scalable Ferroelectric Compute-in-Memory Annealer for Accelerating Boolean Satisfiability

FIMA: A Scalable Ferroelectric Compute-in-Memory Annealer for Accelerating Boolean Satisfiability 150 150

Abstract:

In-memory compute kernels present a promising approach for addressing data-centric workloads. However, their scalability—particularly for computationally intensive tasks solving combinatorial optimization problems such as Boolean satisfiability (SAT), which are inherently difficult to decompose—remains a significant challenge. In this work, we propose a ferroelectric nonvolatile memory (NVM)-based compute-in-memory …

View on IEEE Xplore