Computer architecture

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer 150 150

Abstract:

This article introduces a hybrid Slepian beamforming receiver architecture with low power and area costs. Traditional large-scale true-time-delay (TTD) beamformers for wideband wireless communication suffer from high power consumption and high hardware costs. As an alternative, the Slepian beamforming approach reduces the number of analog-to-digital conversions (ADCs) and delays for …

View on IEEE Xplore

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference 150 150

Abstract:

This letter presents an approximate digital compute-in-memory (CIM) macro for low-power edge AI inference. It introduces three hierarchical innovations: 1) novel fused approximate multiply-add units (FAMUs) that reduces power and area consumption; 2) a bit-critical weight allocation architecture that optimally balances accuracy and hardware cost; and 3) a dynamic sparsity-adaptive configuration method to …

View on IEEE Xplore

Coupled Simulation Methodology for In-Memory Computing Systems

Coupled Simulation Methodology for In-Memory Computing Systems 150 150

Abstract:

Simulations for the development and optimization of future in-memory computing (IMC) systems often face the problem that the modeling of the large system is desired, but at the same time, the effects at the device level should also be taken into account. Such effects could be due to the material …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

A 3-nm FinFET 563-kbit 35.5-Mbit/mm2 Dual-Rail SRAM With 3.89-pJ/Access High Energy Efficient and 27.5-μW/Mbit One-Cycle Latency Low-Leakage Mode

A 3-nm FinFET 563-kbit 35.5-Mbit/mm2 Dual-Rail SRAM With 3.89-pJ/Access High Energy Efficient and 27.5-μW/Mbit One-Cycle Latency Low-Leakage Mode 150 150

Abstract:

This article presents a high-density (HD) 6T SRAM macro designed in 3-nm FinFET technology with an extended dual-rail (XDR) architecture, addressing active energy and leakage for mobile applications. Two key innovations are introduced: the delayed-wordline in write operation (DEWL) technique and a one-cycle latency low-leakage access mode (1-CLM). The XDR …

View on IEEE Xplore

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC 150 150

Abstract:

Compute-in-memory (CIM) using emerging nonvolatile memory devices is a promising candidate for energy-efficient deep neural network (DNN) inference at the edge. Ferroelectric field-effect transistors (FeFETs) have recently gained attention as nonvolatile, CMOS-compatible devices with a higher on/off ratio and lower read and write energy compared to resistive random-access memory (…

View on IEEE Xplore

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity 150 150

Abstract:

Recently, 3-D human motion generation has become essential in media applications such as film production and augmented reality (AR)/virtual reality (VR) devices, requiring the generation of human joint movements and detailed 3-D meshes for each joint. Traditionally, joint creation required hours or even days, making it impractical for real-time …

View on IEEE Xplore

A 28-nm Digital Compute-in-Memory Ising Annealer With Asynchronous Random Number Generator for Traveling Salesman Problem

A 28-nm Digital Compute-in-Memory Ising Annealer With Asynchronous Random Number Generator for Traveling Salesman Problem 150 150

Abstract:

This work presents a compact digital compute-in-memory (DCIM) Ising annealer targeting large-scale combinatorial optimization. A centroid-based weight mapping method combined with hierarchical clustering reduces the memory capacity required for traveling salesman problem (TSP) weights, enabling efficient mapping with limited on-chip storage. An asynchronous random number generator (ARNG) based on dual …

View on IEEE Xplore

Advancing On-Cell Near-Field Monitoring for Thermal Runaway Detection in EV Batteries

Advancing On-Cell Near-Field Monitoring for Thermal Runaway Detection in EV Batteries 150 150

Abstract:

A cell monitoring system for performance and safety enhancement is presented. It is the first commercially available single-chip-on-cell near-field contactless solution for automotive battery management, simplifying pack interconnect and reducing points of failure. This letter is a companion paper to the earlier ISSCC paper. It provides further details on the …

View on IEEE Xplore