Adders

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multi-Operator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multi-Operator AI Acceleration 150 150

Abstract:

We propose an edge multi-operator computing-in-memory design (EMO-CIM) that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise-Convolution (DW), and Attention operators. It features: (1) A single EMO-CIM bank excels in variable vector-wise MAC for multi-operators; (2) Merging local input-shared compute units with a decode-unit & adder-tree facilitates input/stationary-data similarity-aware computing to improve …

View on IEEE Xplore

Side-Channel Attack-Resistant HMAC-SHA256 Accelerator With Boolean and Arithmetic Masking in Intel 4 CMOS

Side-Channel Attack-Resistant HMAC-SHA256 Accelerator With Boolean and Arithmetic Masking in Intel 4 CMOS 150 150

Abstract:

This work describes a side-channel attack (SCA)-resistant hash-based message authentication code (HMAC) accelerator with secure hash algorithm 2 (SHA-2) using Boolean and arithmetic masking along with the first-reported ASIC implementation in Intel 4 CMOS with 10 M measured traces. Previously reported masked datapath suffers from high area/performance overheads (>100%) designs due to …

View on IEEE Xplore

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference 150 150

Abstract:

This letter presents an approximate digital compute-in-memory (CIM) macro for low-power edge AI inference. It introduces three hierarchical innovations: 1) novel fused approximate multiply-add units (FAMUs) that reduces power and area consumption; 2) a bit-critical weight allocation architecture that optimally balances accuracy and hardware cost; and 3) a dynamic sparsity-adaptive configuration method to …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device 150 150

Abstract:

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first …

View on IEEE Xplore