In-memory computing

A 194.6-TOPS/W Pipelined All Current-Domain Mixed-Signal Compute in Memory in 28-nm CMOS

A 194.6-TOPS/W Pipelined All Current-Domain Mixed-Signal Compute in Memory in 28-nm CMOS 150 150

Abstract:

Mixed-signal CIM (MS-CIM) faces bit-cell nonlinearity, poor linearity at high frequency, and throughput limits. We present a hybrid pipelined current-domain MS-CIM macro featuring bit-cell matched linearization interface (BMLI) and loop-unrolled successive approximation refinement (SAR) ADC fabricated in 28-nm CMOS. A $256{\,}\times {\,}256$ SRAM array with 8-bit inputs, 8-bit weights achieve 10.16-TOPS …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

Leveraging a passive MRAM crossbar for hardware-in-the-loop and continual learning

Leveraging a passive MRAM crossbar for hardware-in-the-loop and continual learning 150 150

Abstract:

Artificial neural networks have enabled major advances in artificial intelligence, yet their growing computational and energy demands challenge conventional von Neumann architectures due to the costly separation of memory and processing. In-memory computing has emerged as a promising solution, particularly through memristive crossbar arrays capable of performing multiply-and-accumulate operations directly …

View on IEEE Xplore

STAR-SRAM: 16-bit Floating-Point SRAM-Based Digital Computing-in-Memory Macro in a 28 nm

STAR-SRAM: 16-bit Floating-Point SRAM-Based Digital Computing-in-Memory Macro in a 28 nm 150 150

Abstract:

A digital computing-in-memory (DCIM) macro emerges as a promising building block in a deep neural network (DNN) accelerator. To better support DNN workloads, circuit designers aim to improve three main metrics for macros: energy efficiency, compute density, and weight density. Improvements in those metrics directly translate into reduced energy consumption, …

View on IEEE Xplore

A 28-nm FD-SOI CMOS Analog-IMC Core Based on PCM Featuring 8 512 × 512-Weight Layers and 28M Weights×TOPs/W/mm2

A 28-nm FD-SOI CMOS Analog-IMC Core Based on PCM Featuring 8 512 × 512-Weight Layers and 28M Weights×TOPs/W/mm2 150 150

Abstract:

In-memory computing (IMC) hardware accelerators for deep neural networks (DNNs) require storing a massive number of coefficients within a single computing macro to avoid performance degradation in multicore clusters. This aspect, often overlooked by common figures of merit (FoMs), can be effectively addressed by phase-change memory (PCM) technology, thanks to …

View on IEEE Xplore

A 28-nm PVT Inner-Tracking Time-Domain Compute-In-Memory Macro for Edge-AI Devices

A 28-nm PVT Inner-Tracking Time-Domain Compute-In-Memory Macro for Edge-AI Devices 150 150

Abstract:

This article presents an energy-efficient and process-, voltage-, and temperature (PVT)-robust time-domain (TD) compute-in-memory (CIM) macro for edge artificial intelligence (AI) devices. It features: 1) a PVT inner-tracking (PIT) technique that aligns the PVT responses of TD computation and TD quantization, delivering inherent robustness without incurring extra power or circuit …

View on IEEE Xplore

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration 150 150

Abstract:

We propose an edge multioperator computing-in-memory (EMO-CIM) design that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise (DW)-Convolution, and Attention operators. It features: 1) a single EMO-CIM bank (ECB) excels in variable vector-wise MAC (V-MAC) for multioperators; 2) merging local input-shared compute units (LISCUs) with a decode-unit and adder-tree (DUAT) facilitates …

View on IEEE Xplore

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference

A Folded-Differential Switched-Capacitor SRAM CIM Macro With Scalable MAC Sizes for TinyML Inference 150 150

Abstract:

This letter presents a switched-capacitor SRAM compute-in-memory macro optimized for TinyML inference. Key features include: 1) an area-efficient folded-differential multiply-and-accumulate (FD-MAC) scheme to double the signal margin; 2) a closed-loop floating-inverter amplifier (FIA)-based charge accumulation technique for signal-to-noise ratio enhancement and multiply-and-accumulate (MAC) voltage integration; and 3) a sparsity-aware multistep MAC method …

View on IEEE Xplore

A 14-nm Nonvolatile-Volatile-Fused Compute-In-Memory Macro Based on Logic-Compatible Flash for Plastic Neural Networks

A 14-nm Nonvolatile-Volatile-Fused Compute-In-Memory Macro Based on Logic-Compatible Flash for Plastic Neural Networks 150 150

Abstract:

Designing computing-in-memory (CIM) chips with synaptic plasticity can potentially support energy-efficient on-chip learning in edge devices for rapid local task adaptation. Its silicon implementation is challenging as it requires hybridizing nonvolatile and volatile memory (VM) and customized computational operations. In this work, we propose a plastic CIM (P-CIM) macro featuring: 1) …

View on IEEE Xplore