Artificial intelligence

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration 150 150

Abstract:

We propose an edge multioperator computing-in-memory (EMO-CIM) design that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise (DW)-Convolution, and Attention operators. It features: 1) a single EMO-CIM bank (ECB) excels in variable vector-wise MAC (V-MAC) for multioperators; 2) merging local input-shared compute units (LISCUs) with a decode-unit and adder-tree (DUAT) facilitates …

View on IEEE Xplore

A 56-Gb/s Hybrid Silicon Photonic and 5-nm CMOS 3-D-Integrated Transceiver for Optical Compute I/O

A 56-Gb/s Hybrid Silicon Photonic and 5-nm CMOS 3-D-Integrated Transceiver for Optical Compute I/O 150 150

Abstract:

This work presents a hybrid 3-D-integrated silicon photonic (SiPh) transceiver suitable for realizing chiplet-based optical I/O in future AI/ML ASIC packages. The optical transceiver die stack is composed of two ICs: a SiPh IC (PIC) with micrometer-scale, thermally robust electro-absorption modulators (EAMs), and a 5-nm CMOS electronic IC (…

View on IEEE Xplore

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This letter presents a novel hardware accelerator compatible with <3- $\mu $ m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

A 112-Gb/s PAM4 Receiver With a Phase Equalization AFE in 7-nm FinFET

A 112-Gb/s PAM4 Receiver With a Phase Equalization AFE in 7-nm FinFET 150 150

Abstract:

To reduce the bit-error-rate (BER), equalizers are implemented in high-speed SerDes receivers (RX) to compensate for channel insertion loss and mitigate intersymbol interference (ISI). Conventional analog front-end (AFE) designs primarily focus on amplitude gain while neglecting the influence of phase shift. This brief presents a phase equalization (PEQ) AFE design …

View on IEEE Xplore

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

A 0.8-μm 32-Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN

A 0.8-μm 32-Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN 150 150

Abstract:

This letter presents a CMOS image sensor (CIS) that integrates two operation modes: 1) a high-resolution viewing mode with $0.8~\mu $ m 32 Mpixels and 2) a low-power always-on object recognition mode consuming 2.67 mW at 10 frames/s. The CIS features a unique windmill-pattern analog edge extraction circuit that is resilient to illumination variations. An …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications 150 150

Abstract:

Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation system (RecSys). However, several challenges still exist. First, a large gap between the macro and system-level CIM energy efficiency is observed. Second, several memory-dominate operations, such as embedding in …

View on IEEE Xplore