In-memory computing

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration 150 150

Abstract:

We propose an edge multioperator computing-in-memory (EMO-CIM) design that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise (DW)-Convolution, and Attention operators. It features: 1) a single EMO-CIM bank (ECB) excels in variable vector-wise MAC (V-MAC) for multioperators; 2) merging local input-shared compute units (LISCUs) with a decode-unit and adder-tree (DUAT) facilitates …

View on IEEE Xplore

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference 150 150

Abstract:

This letter presents a switched-capacitor SRAM compute-in-memory macro optimized for TinyML inference. Key features include: 1) an area-efficient folded-differential multiply-and-accumulate (FD-MAC) scheme to double the signal margin; 2) a closed-loop floating-inverter amplifier (FIA)-based charge accumulation technique for signal-to-noise ratio enhancement and multiply-and-accumulate (MAC) voltage integration; and 3) a sparsity-aware multistep MAC method …

View on IEEE Xplore

A 14-nm Nonvolatile-Volatile-Fused Compute-In-Memory Macro Based on Logic-Compatible Flash for Plastic Neural Networks

A 14-nm Nonvolatile-Volatile-Fused Compute-In-Memory Macro Based on Logic-Compatible Flash for Plastic Neural Networks 150 150

Abstract:

Designing computing-in-memory (CIM) chips with synaptic plasticity can potentially support energy-efficient on-chip learning in edge devices for rapid local task adaptation. Its silicon implementation is challenging as it requires hybridizing nonvolatile and volatile memory (VM) and customized computational operations. In this work, we propose a plastic CIM (P-CIM) macro featuring: 1) …

View on IEEE Xplore

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks 150 150

Abstract:

Computing-in-memory (CIM) architectures have demonstrated strong potential for edge artificial intelligence (AI) devices due to their enhanced parallelism and energy efficiency. With the growing complexity of AI tasks and the rapid increase in model size, computation and deployment costs have surged. Depthwise separable neural networks (DSNNs) have attracted interest for …

View on IEEE Xplore

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference 150 150

Abstract:

This letter presents an approximate digital compute-in-memory (CIM) macro for low-power edge AI inference. It introduces three hierarchical innovations: 1) novel fused approximate multiply-add units (FAMUs) that reduces power and area consumption; 2) a bit-critical weight allocation architecture that optimally balances accuracy and hardware cost; and 3) a dynamic sparsity-adaptive configuration method to …

View on IEEE Xplore

Coupled Simulation Methodology for In-Memory Computing Systems

Coupled Simulation Methodology for In-Memory Computing Systems 150 150

Abstract:

Simulations for the development and optimization of future in-memory computing (IMC) systems often face the problem that the modeling of the large system is desired, but at the same time, the effects at the device level should also be taken into account. Such effects could be due to the material …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC 150 150

Abstract:

Compute-in-memory (CIM) using emerging nonvolatile memory devices is a promising candidate for energy-efficient deep neural network (DNN) inference at the edge. Ferroelectric field-effect transistors (FeFETs) have recently gained attention as nonvolatile, CMOS-compatible devices with a higher on/off ratio and lower read and write energy compared to resistive random-access memory (…

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore