Arrays

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks 150 150

Abstract:

Computing-in-memory (CIM) architectures have demonstrated strong potential for edge artificial intelligence (AI) devices due to their enhanced parallelism and energy efficiency. With the growing complexity of AI tasks and the rapid increase in model size, computation and deployment costs have surged. Depthwise separable neural networks (DSNNs) have attracted interest for …

View on IEEE Xplore

A 29-Gb/mm2 1-Tb 3-b/Cell 3-D Flash Memory With CMOS Direct Bonded Array (CBA) Technology

A 29-Gb/mm2 1-Tb 3-b/Cell 3-D Flash Memory With CMOS Direct Bonded Array (CBA) Technology 150 150

Abstract:

This article reports a 1-Tb 3-b/cell 3-D flash memory fabricated with CMOS direct bonded array (CBA) technology. Compaction of circuits and wires achieves the highest bit density in the world over 29 Gb/mm2 with 332-word line (WL) layers. The bit density is improved by 71% from a previous generation despite …

View on IEEE Xplore

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications 150 150

Abstract:

Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation system (RecSys). However, several challenges still exist. First, a large gap between the macro and system-level CIM energy efficiency is observed. Second, several memory-dominate operations, such as embedding in …

View on IEEE Xplore

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping

ROZK: An Energy-Efficient DNN Accelerator Based on Reconfigurable NoC and Local Zero-Skipping 150 150

Abstract:

Zero-skipping is a famous technique to improve the energy efficiency of deep neural network (DNN) accelerators. When the zero-skipping is realized with encoded data using lossless compression, irregular and unpredictable size of data due to inconsistent compression rate incurs several design issues including: 1) load imbalance from irregularity of data stored …

View on IEEE Xplore

A High-Density Low-Leakage and Low-Power Fully Voltage-Stacked SRAM for IoT Application

A High-Density Low-Leakage and Low-Power Fully Voltage-Stacked SRAM for IoT Application 150 150

Abstract:

The general approach to suppress leakage in static random access memory (SRAM) is to use a low voltage ( $V_{text {L}}$ ), generated by a low-dropout regulator (LDO), as the cell supply voltage (CVDD) of SRAM array in the standby mode. However, the effectiveness of lowering CVDD is constrained by the …

View on IEEE Xplore

Cryogenic Hyperdimensional In-Memory Computing Using Ferroelectric TCAM

Cryogenic Hyperdimensional In-Memory Computing Using Ferroelectric TCAM 150 150

Abstract:

Cryogenic operations of electronics present a significant step forward to achieve huge demand of in-memory computing (IMC) for high-performance computing, quantum computing, and military applications. Ferroelectric (FE) is a promising candidate to develop the complementary metal oxide semiconductor (CMOS)-compatible nonvolatile memories. Hence, in this work, we investigate the effectiveness …

View on IEEE Xplore

Binarized Neural-Network Parallel-Processing Accelerator Macro Designed for an Energy Efficiency Higher Than 100 TOPS/W

Binarized Neural-Network Parallel-Processing Accelerator Macro Designed for an Energy Efficiency Higher Than 100 TOPS/W 150 150

Abstract:

A binarized neural-network (BNN) accelerator macro is developed based on a processing-in-memory (PIM) architecture having the ability of eight-parallel multiply-accumulate (MAC) processing. The parallel-processing PIM macro, referred to as a PPIM macro, is designed to perform the parallel processing with no use of multiport SRAM cells and to achieve the …

View on IEEE Xplore