Throughput

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling 150 150

Abstract:

White-box cryptography (WBC) seeks to protect secret keys (SKs) even under the white-box security model that features adversaries having full control of the execution environment. Due to the ever-growing demand for content protection under security-critical scenarios, the recent progress on WBC has been nothing short of spectacular. However, the security-prioritized …

View on IEEE Xplore

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks 150 150

Abstract:

Computing-in-memory (CIM) architectures have demonstrated strong potential for edge artificial intelligence (AI) devices due to their enhanced parallelism and energy efficiency. With the growing complexity of AI tasks and the rapid increase in model size, computation and deployment costs have surged. Depthwise separable neural networks (DSNNs) have attracted interest for …

View on IEEE Xplore

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme 150 150

Abstract:

This article presents DPe-CIM, a 4T-1C dual-port embedded dynamic random access memory (eDRAM)-based compute-in-memory (CIM) macro with adaptive refresh and data conversion reduction. DPe-CIM proposes four key features that improve area and energy efficiency: 1) dual-port eDRAM cell (DPC) separates the multiply-and-accumulate (MAC) and refresh ports, enabling simultaneous MAC …

View on IEEE Xplore

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI 150 150

Abstract:

Analog computing offers intrinsic energy and latency benefits that makes it attractive for real-time and edge applications. Conventional analog accelerators suffer from repeated conversions between analog and digital domain, which degrades efficiency and throughput. We propose an all-analog pipelined neural network accelerator architecture in 22 nm fully-depleted silicon-on-insulator (FD-SOI) complementary metal-oxide-semiconductor (…

View on IEEE Xplore

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core 150 150

Abstract:

This letter presents an adaptable ring oscillator (RO)-true random number generator (TRNG) that removes the fixed power–throughput tradeoff by selecting delay-cell physics at run time. A hybrid core uses a current-starved inverter in low-power (LP) mode to amplify slew-limited jitter for high bit-efficiency at low frequency, and a …

View on IEEE Xplore

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET 150 150

Abstract:

This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory …

View on IEEE Xplore

3-D Stacked HBM and Compute Accelerators for LLM: Optimizing Thermal Management and Power Delivery Efficiency

3-D Stacked HBM and Compute Accelerators for LLM: Optimizing Thermal Management and Power Delivery Efficiency 150 150

Abstract:

Advanced packaging is becoming essential for designing hardware accelerators for large language models (LLMs). Different architectures, such as 2.5-D integration of memory with logic, have been proposed; however, the bandwidth limits the throughput of the complete system. Recent works have proposed memory on logic systems, where high bandwidth memory (HBM) …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement 150 150

Abstract:

The extreme gradient boosting (XGBoost) has emerged as a powerful AI algorithm, achieving high accuracy and winning multiple Kaggle competitions in various tasks including medical diagnosis, recommendation systems, and autonomous driving. It has great potential for running on edge devices due to its binary tree-based simple computing kernel, offering unique …

View on IEEE Xplore