Throughput

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs 150 150

Abstract:

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices …

View on IEEE Xplore

A 7-Level 18-Wire-State Trio-Signaling Transmitter for MIPI C-PHY 3.0 Interfaces

A 7-Level 18-Wire-State Trio-Signaling Transmitter for MIPI C-PHY 3.0 Interfaces 150 150

Abstract:

This letter presents a MIPI C-PHY v3.0 TX, which adopts trio-signaling using three wires per lane. Each line supports seven-level signaling, enabling 18 wire states to map 32-bit data into nine symbols, achieving 3.56 bits/symbol efficiency. Balanced coding maintains constant driver current, enhancing SSO noise immunity, and embedded clocking is achieved …

View on IEEE Xplore

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency 150 150

Abstract:

The proliferation of large language models (LLMs) as cross-domain foundation models is fueled by aggressive scaling in both parameter counts and inference-time computation. The emergence of sophisticated reasoning models further accelerates this trend, demanding longer context windows and escalating the computational and memory burdens of inference. A fundamental challenge arises …

View on IEEE Xplore

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling

SiWB: A 28-nm 800-MHz 4.2-to-14.2-Gb/s/W Configurable Multi-Core Architecture for White-Box Block Cipher With Area-Efficient Random Linear Transformation and Load-Aware Inter-Core Scheduling 150 150

Abstract:

White-box cryptography (WBC) seeks to protect secret keys (SKs) even under the white-box security model that features adversaries having full control of the execution environment. Due to the ever-growing demand for content protection under security-critical scenarios, the recent progress on WBC has been nothing short of spectacular. However, the security-prioritized …

View on IEEE Xplore

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks

MixCIM: A Hybrid Computing-in-Memory Macro With Less Data-Movement and Better Memory-Reuse for Depthwise Separable Neural Networks 150 150

Abstract:

Computing-in-memory (CIM) architectures have demonstrated strong potential for edge artificial intelligence (AI) devices due to their enhanced parallelism and energy efficiency. With the growing complexity of AI tasks and the rapid increase in model size, computation and deployment costs have surged. Depthwise separable neural networks (DSNNs) have attracted interest for …

View on IEEE Xplore

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme

DPe-CIM: A 4T-1C Dual-Port eDRAM-Based Compute-in-Memory for Simultaneous Computing and Refresh With Adaptive Refresh and Data Conversion Reduction Scheme 150 150

Abstract:

This article presents DPe-CIM, a 4T-1C dual-port embedded dynamic random access memory (eDRAM)-based compute-in-memory (CIM) macro with adaptive refresh and data conversion reduction. DPe-CIM proposes four key features that improve area and energy efficiency: 1) dual-port eDRAM cell (DPC) separates the multiply-and-accumulate (MAC) and refresh ports, enabling simultaneous MAC …

View on IEEE Xplore

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI 150 150

Abstract:

Analog computing offers intrinsic energy and latency benefits that makes it attractive for real-time and edge applications. Conventional analog accelerators suffer from repeated conversions between analog and digital domain, which degrades efficiency and throughput. We propose an all-analog pipelined neural network accelerator architecture in 22 nm fully-depleted silicon-on-insulator (FD-SOI) complementary metal-oxide-semiconductor (…

View on IEEE Xplore

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core 150 150

Abstract:

This letter presents an adaptable ring oscillator (RO)-true random number generator (TRNG) that removes the fixed power–throughput tradeoff by selecting delay-cell physics at run time. A hybrid core uses a current-starved inverter in low-power (LP) mode to amplify slew-limited jitter for high bit-efficiency at low frequency, and a …

View on IEEE Xplore

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET 150 150

Abstract:

This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory …

View on IEEE Xplore