Artificial intelligence

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This work presents a novel hardware accelerator compatible with <3μm pitch 3D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

A 112-Gb/s PAM4 Receiver With a Phase Equalization AFE in 7-nm FinFET

A 112-Gb/s PAM4 Receiver With a Phase Equalization AFE in 7-nm FinFET 150 150

Abstract:

To reduce the bit-error-rate (BER), equalizers are implemented in high-speed SerDes receivers (RX) to compensate for channel insertion loss and mitigate intersymbol interference (ISI). Conventional analog front-end (AFE) designs primarily focus on amplitude gain while neglecting the influence of phase shift. This brief presents a phase equalization (PEQ) AFE design …

View on IEEE Xplore

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

A 0.8-μm 32-Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN

A 0.8-μm 32-Mpixel Always-On CMOS Image Sensor With Windmill-Pattern Edge Extraction and On-Chip DNN 150 150

Abstract:

This letter presents a CMOS image sensor (CIS) that integrates two operation modes: 1) a high-resolution viewing mode with $0.8~\mu $ m 32 Mpixels and 2) a low-power always-on object recognition mode consuming 2.67 mW at 10 frames/s. The CIS features a unique windmill-pattern analog edge extraction circuit that is resilient to illumination variations. An …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications 150 150

Abstract:

Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation system (RecSys). However, several challenges still exist. First, a large gap between the macro and system-level CIM energy efficiency is observed. Second, several memory-dominate operations, such as embedding in …

View on IEEE Xplore