Computer architecture

A Low-Reference-Spur Injection-Locked Clock Multiplier Using Sub-Sampling Frequency Tracking Loop and Injection Pulse Timing Calibrator

A Low-Reference-Spur Injection-Locked Clock Multiplier Using Sub-Sampling Frequency Tracking Loop and Injection Pulse Timing Calibrator 150 150

Abstract:

This article presents an injection-locked clock multiplier (ILCM) achieving the low-reference spur (spur ${}_{\mathrm {REF}}$ ) with minimal overhead of a calibrator. To remove the dominant sources of frequency error, which are frequency drift ( $f_{\mathrm {DF}}$ ), phase offset ( $\varPhi _{\mathrm {OS}}$ ), and injection-induced phase error ( $\varPhi _{\mathrm {INJ}}$ ), the ILCM …

View on IEEE Xplore

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3D HBI Compliant 1.536TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This work presents a novel hardware accelerator compatible with <3μm pitch 3D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute Multi Head Attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations for …

View on IEEE Xplore

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling 150 150

Abstract:

White-box cryptography (WBC) seeks to protect secret keys (SKs) even under the white-box security model that features adversaries having full control of the execution environment. Due to the ever-growing demand for content protection under security-critical scenarios, the recent progress on WBC has been nothing short of spectacular. However, the security-prioritized …

View on IEEE Xplore

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer

A Multiply-and-Accumulate SAR-ADC-Based Hybrid Slepian Beamformer 150 150

Abstract:

This article introduces a hybrid Slepian beamforming receiver architecture with low power and area costs. Traditional large-scale true-time-delay (TTD) beamformers for wideband wireless communication suffer from high power consumption and high hardware costs. As an alternative, the Slepian beamforming approach reduces the number of analog-to-digital conversions (ADCs) and delays for …

View on IEEE Xplore

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference

An Approximate Digital CIM Macro With Low-Power Multiply-Add Units and Dynamic Sparse-Adaptive Configuring for Edge AI Inference 150 150

Abstract:

This letter presents an approximate digital compute-in-memory (CIM) macro for low-power edge AI inference. It introduces three hierarchical innovations: 1) novel fused approximate multiply-add units (FAMUs) that reduces power and area consumption; 2) a bit-critical weight allocation architecture that optimally balances accuracy and hardware cost; and 3) a dynamic sparsity-adaptive configuration method to …

View on IEEE Xplore

Coupled Simulation Methodology for In-Memory Computing Systems

Coupled Simulation Methodology for In-Memory Computing Systems 150 150

Abstract:

Simulations for the development and optimization of future in-memory computing (IMC) systems often face the problem that the modeling of the large system is desired, but at the same time, the effects at the device level should also be taken into account. Such effects could be due to the material …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

A 3-nm FinFET 563-kbit 35.5-Mbit/mm2 Dual-Rail SRAM With 3.89-pJ/Access High Energy Efficient and 27.5-μW/Mbit One-Cycle Latency Low-Leakage Mode

A 3-nm FinFET 563-kbit 35.5-Mbit/mm2 Dual-Rail SRAM With 3.89-pJ/Access High Energy Efficient and 27.5-μW/Mbit One-Cycle Latency Low-Leakage Mode 150 150

Abstract:

This article presents a high-density (HD) 6T SRAM macro designed in 3-nm FinFET technology with an extended dual-rail (XDR) architecture, addressing active energy and leakage for mobile applications. Two key innovations are introduced: the delayed-wordline in write operation (DEWL) technique and a one-cycle latency low-leakage access mode (1-CLM). The XDR …

View on IEEE Xplore

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC 150 150

Abstract:

Compute-in-memory (CIM) using emerging nonvolatile memory devices is a promising candidate for energy-efficient deep neural network (DNN) inference at the edge. Ferroelectric field-effect transistors (FeFETs) have recently gained attention as nonvolatile, CMOS-compatible devices with a higher on/off ratio and lower read and write energy compared to resistive random-access memory (…

View on IEEE Xplore