Artificial neural networks

A 0.8μm 32Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN

A 0.8μm 32Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN 150 150

Abstract:

This paper presents a CMOS image sensor (CIS) that integrates two operation modes: a high-resolution viewing mode with 0.8μm 32 Mpixels and a low-power always-on object recognition mode consuming 2.67 mW at 10 fps. The CIS features a unique windmill-pattern analog edge extraction circuit that is resilient to illumination variations. An on-chip deep …

View on IEEE Xplore

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI 150 150

Abstract:

Analog computing offers intrinsic energy and latency benefits that makes it attractive for real-time and edge applications. Conventional analog accelerators suffer from repeated conversions between analog and digital domain, which degrades efficiency and throughput. We propose an all-analog pipelined neural network accelerator architecture in 22 nm FD-SOI CMOS. Measurements of a …

View on IEEE Xplore

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm 150 150

Abstract:

This article presents an analog-in analog-out CIM macro (AACIM) for use in analog deep neural network (DNN) processors. Our macro receives analog inputs, performs a 64-by-32 vector–matrix multiplication (VMM) with a current-discharging computation mechanism, and produces analog outputs. It stores a 4-bit weight as an analog voltage in the …

View on IEEE Xplore

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device 150 150

Abstract:

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first …

View on IEEE Xplore

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications 150 150

Abstract:

Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation system (RecSys). However, several challenges still exist. First, a large gap between the macro and system-level CIM energy efficiency is observed. Second, several memory-dominate operations, such as embedding in …

View on IEEE Xplore

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping 150 150

Abstract:

Zero-skipping is a famous technique to improve the energy efficiency of deep neural network (DNN) accelerators. When the zero-skipping is realized with encoded data using lossless compression, irregular and unpredictable size of data due to inconsistent compression rate incurs several design issues including: 1) load imbalance from irregularity of data stored …

View on IEEE Xplore

Binarized Neural-Network Parallel-Processing Accelerator Macro Designed for an Energy Efficiency Higher Than 100 TOPS/W

Binarized Neural-Network Parallel-Processing Accelerator Macro Designed for an Energy Efficiency Higher Than 100 TOPS/W 150 150

Abstract:

A binarized neural-network (BNN) accelerator macro is developed based on a processing-in-memory (PIM) architecture having the ability of eight-parallel multiply-accumulate (MAC) processing. The parallel-processing PIM macro, referred to as a PPIM macro, is designed to perform the parallel processing with no use of multiport SRAM cells and to achieve the …

View on IEEE Xplore