Energy efficiency

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET 150 150

Abstract:

The capability to detect small objects precisely in real time is essential for intelligent systems, particularly in advanced driver assistance systems (ADASs), as it ensures continuous awareness of distant obstacles for enhanced safety. However, achieving high detection precision for small objects requires high-resolution input inference on deep convolutional neural network (…

View on IEEE Xplore

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

Energy-efficient Logic-in-memory and Neuromorphic Computing in RSD MOSFETs

Energy-efficient Logic-in-memory and Neuromorphic Computing in RSD MOSFETs 150 150

Abstract:

This work highlights the potential application of raised source and drain (RSD) MOSFETs-based charge trapping memory (CTM) for next-generation computing applications. This simulation study presents a double gate (DG)-RSD MOSFET technology with a short gate length (50 nm) to significantly improve the performance of logic-in-memory (LIM) and neuromorphic computing (NC) …

View on IEEE Xplore

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme 150 150

Abstract:

This article presents DPe-CIM, a 4T-1C dual-port embedded dynamic random access memory (eDRAM)-based compute-in-memory (CIM) macro with adaptive refresh and data conversion reduction. DPe-CIM proposes four key features that improve area and energy efficiency: 1) dual-port eDRAM cell (DPC) separates the multiply-and-accumulate (MAC) and refresh ports, enabling simultaneous MAC …

View on IEEE Xplore

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI 150 150

Abstract:

Analog computing offers intrinsic energy and latency benefits that makes it attractive for real-time and edge applications. Conventional analog accelerators suffer from repeated conversions between analog and digital domain, which degrades efficiency and throughput. We propose an all-analog pipelined neural network accelerator architecture in 22 nm fully-depleted silicon-on-insulator (FD-SOI) complementary metal-oxide-semiconductor (…

View on IEEE Xplore

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm 150 150

Abstract:

This article presents an analog-in analog-out CIM macro (AACIM) for use in analog deep neural network (DNN) processors. Our macro receives analog inputs, performs a 64-by-32 vector–matrix multiplication (VMM) with a current-discharging computation mechanism, and produces analog outputs. It stores a 4-bit weight as an analog voltage in the …

View on IEEE Xplore

A 14-b Energy-Efficient BW/Power Scalable CTDSM With a Frequency-Controlled Current Source

A 14-b Energy-Efficient BW/Power Scalable CTDSM With a Frequency-Controlled Current Source 150 150

Abstract:

This work presents a 14-bit energy-efficient bandwidth (BW)/power scalable continuous-time delta–sigma modulator (CTDSM) for sensor interfaces in IoT applications. To ensure low noise for small input signals and achieve BW/power scalability, it is built around Gm-C integrators biased via a linear frequency-controlled current source (FCCS). The FCCS …

View on IEEE Xplore

A Machine Learning-Inspired PAM-4 Transceiver for Medium-Reach Wireline Links

A Machine Learning-Inspired PAM-4 Transceiver for Medium-Reach Wireline Links 150 150

Abstract:

This article presents an energy-efficient machine learning-inspired PAM-4 wireline transceiver that leverages data encoding at the transmitter (Tx) and feature extraction with classification at the receiver (Rx) to compensate for channel loss ranging from 13 to 26 dB, while maintaining the bit error rate (BER)<10-11. A new consecutive symbol-to-center (CSC) encoding …

View on IEEE Xplore

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET 150 150

Abstract:

This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory …

View on IEEE Xplore