Computer architecture

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs 150 150

Abstract:

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices …

View on IEEE Xplore

A Logic-Compatible 2-Transistor Embedded Bipolar RRAM MACRO: A 28-nm Multiple-Time Programmable (MTP) Memory Without Extra Masks

A Logic-Compatible 2-Transistor Embedded Bipolar RRAM MACRO: A 28-nm Multiple-Time Programmable (MTP) Memory Without Extra Masks 150 150

Abstract:

This letter presents a 2-transistor (2T) bipolar embedded resistive RAM (eRRAM) MACRO fabricated in a 28-nm high-k metal gate (HKMG) process for multitime programmable (MTP) applications. To overcome the scaling bottlenecks of traditional embedded Flash, this work utilizes an extra-mask-free, pure front-end-of-line (FEOL) integration, offering a robust solution for automotive …

View on IEEE Xplore

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management 150 150

Abstract:

This article presents MITTA, the first silicon-proven transformer accelerator optimized for multi-task inference across both natural language processing (NLP) and image processing domains. MITTA accelerates a task-sharing algorithm that minimizes sub-task computation by reusing both activations and weights from a shared base task, requiring only sparse delta computation for sub-tasks. …

View on IEEE Xplore

Design and Analysis of a Three-Stream STT-MTJ TRNG With XOR and Majority Voter Logic as Postprocessing Architectures

Design and Analysis of a Three-Stream STT-MTJ TRNG With XOR and Majority Voter Logic as Postprocessing Architectures 150 150

Abstract:

True random number generators (TRNGs) are critical for hardware security, providing unpredictable entropy for cryptographic applications. Spin-transfer torque magnetic tunnel junction (STT-MTJ) devices offer a promising entropy source due to their low-power consumption, nonvolatility, and stochastic switching behavior. This work presents an MTJ-based TRNG that produces three independent bit streams. …

View on IEEE Xplore

A 2–18 GHz High-Efficiency CMOS Nonuniform Distributed Power Amplifier With a Novel Reconfigurable Inductive Termination

A 2–18 GHz High-Efficiency CMOS Nonuniform Distributed Power Amplifier With a Novel Reconfigurable Inductive Termination 150 150

Abstract:

This article presents a 2–18 GHz high-efficiency CMOS nonuniform distributed power amplifier (NDPA) with a novel reconfigurable inductive termination technique for ultra-broadband efficiency enhancement. First, the inherent drawback of the degrading efficiency with growing frequency in a conventional non-reconfigurable NDPA architecture with multi-octave bandwidth is studied. A simple and effective reconfigurable …

View on IEEE Xplore

An Eye-Opening Arbiter PUF With Auto-Error Detection and PVT-Robust Masking Achieving a BER of 2e-8

An Eye-Opening Arbiter PUF With Auto-Error Detection and PVT-Robust Masking Achieving a BER of 2e-8 150 150

Abstract:

A hybrid ring oscillator (RO)/ arbiter physical unclonable function (PUF) is implemented in a 28-nm CMOS, where two competing ROs accumulate a sufficiently large phase difference exceeding a predefined deadzone (DZ). The resulting eye-opening arbiter (EOA) architecture enables a prediction of PUF bit stability over temperature change (from −40 °C to $125~^{\…

View on IEEE Xplore

A Low-Reference-Spur Injection-Locked Clock Multiplier Using Sub-Sampling Frequency Tracking Loop and Injection Pulse Timing Calibrator

A Low-Reference-Spur Injection-Locked Clock Multiplier Using Sub-Sampling Frequency Tracking Loop and Injection Pulse Timing Calibrator 150 150

Abstract:

This article presents an injection-locked clock multiplier (ILCM) achieving the low-reference spur (spur ${}_{\mathrm {REF}}$ ) with minimal overhead of a calibrator. To remove the dominant sources of frequency error, which are frequency drift ( $f_{\mathrm {DF}}$ ), phase offset ( $\varPhi _{\mathrm {OS}}$ ), and injection-induced phase error ( $\varPhi _{\mathrm {INJ}}$ ), the ILCM …

View on IEEE Xplore

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This letter presents a novel hardware accelerator compatible with <3- $\mu $ m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations …

View on IEEE Xplore

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling 150 150

Abstract:

White-box cryptography (WBC) seeks to protect secret keys (SKs) even under the white-box security model that features adversaries having full control of the execution environment. Due to the ever-growing demand for content protection under security-critical scenarios, the recent progress on WBC has been nothing short of spectacular. However, the security-prioritized …

View on IEEE Xplore