IEEE Journal of Solid-State Circuits

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET 150 150

Abstract:

The capability to detect small objects precisely in real time is essential for intelligent systems, particularly in advanced driver assistance systems (ADASs), as it ensures continuous awareness of distant obstacles for enhanced safety. However, achieving high detection precision for small objects requires high-resolution input inference on deep convolutional neural network (…

View on IEEE Xplore

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore

A Calibration-Free Pipelined-SAR ADC With Cross-Stage Gain-Mismatch Error Shaping and Inherent Noise Shaping

A Calibration-Free Pipelined-SAR ADC With Cross-Stage Gain-Mismatch Error Shaping and Inherent Noise Shaping 150 150

Abstract:

This article presents a calibration-free pipelined-successive-approximation-register (SAR) analog-to-digital converter (ADC) based on the proposed cross-stage gain-mismatch-error shaping (CS-GMES) mechanism. The CS-GMES is realized by including the entire 2nd stage into MES operation to unify the gain error and the 2nd-stage mismatch error. A feedback capacitor provides cross-stage connection and mismatch …

View on IEEE Xplore

A 40-nm 209-TOPS/W Reinforcement Learning Processor With Full Speculation Exploitation and Inference-Training Parallel Processing

A 40-nm 209-TOPS/W Reinforcement Learning Processor With Full Speculation Exploitation and Inference-Training Parallel Processing 150 150

Abstract:

Reinforcement learning (RL) has found widespread applications across diverse domains, making energy-efficient implementations imperative. This article presents an energy-efficient RL processor featuring full speculation exploitation and parallel processing for inference and training. Binary direct feedback alignment (DFA) is applied to perform error propagation in parallel, reducing the computational complexity by 23%. …

View on IEEE Xplore

An Efficient Power Management Unit With Continuous MPPT and Energy Recycling for Wireless Millimetric Biomedical Implants

An Efficient Power Management Unit With Continuous MPPT and Energy Recycling for Wireless Millimetric Biomedical Implants 150 150

Abstract:

Biomedical implants offer transformative tools to improve medical outcomes. To realize minimally invasive implants with miniaturized volume and weight, wireless power transfer (WPT) has been extensively studied to replace bulky batteries that dominate the volume of traditional implants and require surgical replacements. Ultrasonic (US) and magnetoelectric (ME) WPT modalities, which …

View on IEEE Xplore

AFP-CIM: All-Inclusive Floating-Point With Segmented Compute-in-Memory Macro

AFP-CIM: All-Inclusive Floating-Point With Segmented Compute-in-Memory Macro 150 150

Abstract:

This article reports an all-inclusive floating-point (AFP) with the segmented 8T-static random access memory (SRAM) compute-in-memory (CIM) macro. It features: 1) a segmented read-wordline (SRWL) to efficiently support the AFP formats including the AFP4/6/8/16 with all of the exponent–mantissa ratios (EMRs); 2) a bit-wise accumulation first (BWAF) circuit structure with a …

View on IEEE Xplore

A Hybrid SCVR With 4 × C F Continuously Scalable-Conversion Ratio SC Stage

A Hybrid SCVR With 4 × C F Continuously Scalable-Conversion Ratio SC Stage 150 150

Abstract:

This work presents a hybrid switched-capacitor (SC) voltage regulator (VR) with a continuously scalable-conversion (CSC)-ratio stage implemented with off-chip flying capacitors ( $boldsymbol {C}_{ F}$ s). By eliminating the need for high-density on-chip capacitors, this approach offers broader accessibility to designers. The use of off-chip $boldsymbol {C}_{ F}$ s mandates …

View on IEEE Xplore

A Real-Time Deep Reinforcement Learning Processor for Mapless Autonomous Navigation With Unified Actor-Critic Network and Inference-on-Request Scheduling

A Real-Time Deep Reinforcement Learning Processor for Mapless Autonomous Navigation With Unified Actor-Critic Network and Inference-on-Request Scheduling 150 150

Abstract:

This article presents a real-time deep reinforcement learning (DRL) processor for mapless autonomous navigation targeting resource- and energy-constrained mobile robots. The unified actor-critic network architecture combined with feature map caching enables parameter sharing and eliminates redundant computations. This approach reduces the total parameter count, external memory access (EMA), and overall …

View on IEEE Xplore