Throughput

3-D Stacked HBM and Compute Accelerators for LLM: Optimizing Thermal Management and Power Delivery Efficiency

3-D Stacked HBM and Compute Accelerators for LLM: Optimizing Thermal Management and Power Delivery Efficiency 150 150

Abstract:

Advanced packaging is becoming essential for designing hardware accelerators for large language models (LLMs). Different architectures, such as 2.5-D integration of memory with logic, have been proposed; however, the bandwidth limits the throughput of the complete system. Recent works have proposed memory on logic systems, where high bandwidth memory (HBM) …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement 150 150

Abstract:

The extreme gradient boosting (XGBoost) has emerged as a powerful AI algorithm, achieving high accuracy and winning multiple Kaggle competitions in various tasks including medical diagnosis, recommendation systems, and autonomous driving. It has great potential for running on edge devices due to its binary tree-based simple computing kernel, offering unique …

View on IEEE Xplore

24.86 Gb/s Full-Digital Chaos Random Number 1.53 GSamples/s Noise Generator in 40nm CMOS

24.86 Gb/s Full-Digital Chaos Random Number 1.53 GSamples/s Noise Generator in 40nm CMOS 150 150

Abstract:

This letter presents a fully digital true random number generator (TRNG) and noise generator (NG) based on a chaos system. We design the chaos random number generator (CRNG) using the proposed Euler-based modified Lorenz system with periodic perturbation and modified modulo unit. The chaos NG (CNG) processor integrates the CRNG …

View on IEEE Xplore

MIX-ACIM: A 28-nm Mixed-Precision Analog Compute-in-Memory With Digital Feature Restoration for Vector-Matrix Multiplication

MIX-ACIM: A 28-nm Mixed-Precision Analog Compute-in-Memory With Digital Feature Restoration for Vector-Matrix Multiplication 150 150

Abstract:

A mixed-precision analog compute-in-memory (Mix-ACIM) is presented for mixed-precision vector-matrix multiplication (VMM). The design features an all-analog current-domain fixed-point (FxP) VMM with floating-point conversion and feature restoration. A 28 nm CMOS test chip shows 41 TOPS/W and 24 TOPS/mm2 for FxP (8-bit input/weight and 12-bit output) and 24.18 TFLOPS/W and 3.3 …

View on IEEE Xplore