Training

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

Antiferromagnetic Programmable Neuron: Structure, Training, and Pattern Recognition Applications

Antiferromagnetic Programmable Neuron: Structure, Training, and Pattern Recognition Applications 150 150

Abstract:

Artificial neurons based on antiferromagnetic (AFM) spin Hall oscillators (SHOs) are promising elements for creating ultra-fast, energy-efficient neuromorphic computing systems. These structures can generate picosecond spikes in response to dc and ac electric currents, thereby mimicking the reaction of biological neurons to an external stimuli. However, conventional AFM neurons have …

View on IEEE Xplore

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing 150 150

Abstract:

Simultaneous localization and mapping (SLAM) provides crucial ego-pose information and 3-D maps of the user environment, which are fundamental to emerging mobile spatial computing devices. Dense 3-D mapping and accurate pose estimation are particularly necessary for applications like augmented reality (AR) and autonomous navigation. However, existing SLAM processors are typically …

View on IEEE Xplore

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement 150 150

Abstract:

The extreme gradient boosting (XGBoost) has emerged as a powerful AI algorithm, achieving high accuracy and winning multiple Kaggle competitions in various tasks including medical diagnosis, recommendation systems, and autonomous driving. It has great potential for running on edge devices due to its binary tree-based simple computing kernel, offering unique …

View on IEEE Xplore

An 8.62-μW 75-dB DRSoC Fully Integrated SoC for Spoken Language Understanding

An 8.62-μW 75-dB DRSoC Fully Integrated SoC for Spoken Language Understanding 150 150

Abstract:

We present a sub-10- $\mu $ W fully integrated SoC for on-device spoken language understanding (SLU). Its analog feature extractor (FEx) applies global and per-channel automatic gain control (AGC) to extend the system’s dynamic range (DR)—a critical requirement for real-world scenarios, including far-field operations. The on-chip streaming-mode recurrent …

View on IEEE Xplore

LIF Neuron Based on a Charge-Powered Ring Oscillator in Weak Inversion Achieving 201 fJ/SOP

LIF Neuron Based on a Charge-Powered Ring Oscillator in Weak Inversion Achieving 201 fJ/SOP 150 150

Abstract:

This letter presents the experimental results of a leaky-integrate-and-fire neuron (LIF) neuron based on time-domain analog circuitry. This kind of neuron is the core of spiking neural network (SNN) used in edge applications. Edge applications require power-efficient neuron designs whose power consumption is extremely low when idle, and low when …

View on IEEE Xplore

A 160-Gb/s D-Band Bi-Directional CMOS Mixer Covering 112–170 GHz for 6G Transceivers

A 160-Gb/s D-Band Bi-Directional CMOS Mixer Covering 112–170 GHz for 6G Transceivers 150 150

Abstract:

This work presents a D-band bi-directional CMOS double-balanced mixer (DBM) supporting data rates over 160 Gb/s with a 58-GHz RF bandwidth (112–170 GHz). The mixer employs four identical NMOS passive switches ( $12~\mu $ m/60 nm) in a DBM topology, providing the isolation between RF, LO, and IF ports. Both IF and RF …

View on IEEE Xplore

A 0-to- 10μF Off-Chip Output Capacitor-Scalable Boost Converter Achieving 96.68% Peak Efficiency

A 0-to- 10μF Off-Chip Output Capacitor-Scalable Boost Converter Achieving 96.68% Peak Efficiency 150 150

Abstract:

This letter presents an off-chip output capacitor (CO)-scalable (OCS) boost converter. The proposed OCS boost converter is possible to operate both with and without the off-chip CO. In addition, it operates in a whole conversion ratio (CR) range over 1 while maintaining a small current ripple of an inductor, resulting …

View on IEEE Xplore

MIX-ACIM: A 28-nm Mixed-Precision Analog Compute-in-Memory With Digital Feature Restoration for Vector-Matrix Multiplication

MIX-ACIM: A 28-nm Mixed-Precision Analog Compute-in-Memory With Digital Feature Restoration for Vector-Matrix Multiplication 150 150

Abstract:

A mixed-precision analog compute-in-memory (Mix-ACIM) is presented for mixed-precision vector-matrix multiplication (VMM). The design features an all-analog current-domain fixed-point (FxP) VMM with floating-point conversion and feature restoration. A 28 nm CMOS test chip shows 41 TOPS/W and 24 TOPS/mm2 for FxP (8-bit input/weight and 12-bit output) and 24.18 TFLOPS/W and 3.3 …

View on IEEE Xplore