Field programmable gate arrays

Denim: Heterogeneous Compute-in-Memory Accelerator Exploiting Denoising–Similarity for Diffusion Models

Denim: Heterogeneous Compute-in-Memory Accelerator Exploiting Denoising–Similarity for Diffusion Models 150 150

Abstract:

Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the main drawbacks of diffusion models is that the image generation process is expensive. Large image-to-image networks have to be applied multiple times in order to iteratively optimize the …

View on IEEE Xplore

HyFPCiM: A 65-nm 417-μW Error-Sensitivity-Aware FP8 Compute-in-Memory Macro

HyFPCiM: A 65-nm 417-μW Error-Sensitivity-Aware FP8 Compute-in-Memory Macro 150 150

Abstract:

This letter presents HyFPCiM, a 65-nm FP8 compute-in-memory (CiM) macro that enables sub-mW floating-point (FP) inference using error-sensitivity-aware FP partitioning (EAP). EAP maps exponent processing to a digital CiM (DCiM) path and mantissa accumulation to an analog CiM (ACiM), avoiding the power- and area-intensive adder-tree-based accumulation used in prior FP-CiM …

View on IEEE Xplore

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs 150 150

Abstract:

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices …

View on IEEE Xplore

A 27.5–28.5 mJ/Frame 3-D Gaussian Rendering Processor With Spherical Beta Illumination and Mixed-Precision Computation Path

A 27.5–28.5 mJ/Frame 3-D Gaussian Rendering Processor With Spherical Beta Illumination and Mixed-Precision Computation Path 150 150

Abstract:

This letter presents a 3-D Gaussian rendering processor that integrates a spherical beta (SB) illumination module with a mixed-precision rendering engine to enable energy-efficient novel-view synthesis on edge devices. SB replaces spherical harmonics (SH) with a hardware-efficient kernel implemented using a pipelined fixed-point piecewise linear (PWL) power unit. The pipeline …

View on IEEE Xplore

On-Chip Charge-Trap-Transistor-Based Mismatch Calibration of an 8-Bit Thermometer Current-Source DAC

On-Chip Charge-Trap-Transistor-Based Mismatch Calibration of an 8-Bit Thermometer Current-Source DAC 150 150

Abstract:

This letter presents an on-chip mismatch calibration technique for current-source digital-to-analog converters (DACs) using charge-trap transistors (CTTs) in 22-nm FDSOI technology. The proposed method exploits programmable threshold voltage (VTH) shifts in CTTs to locally tune the current of near-minimum-sized devices without external trimming. A compact 8-bit thermometer DAC is implemented …

View on IEEE Xplore

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET

A RISC-V SoC With Reconfigurable Custom Instructions on a Synthesized eFPGA Fabric in 22nm FinFET 150 150

Abstract:

This letter presents a flexible and energy-efficient RISC-V system-on-chip (SoC) in 22nm FinFET technology, achieving state-of-the-art performance by tightly integrating the CPU with a synthesized embedded FPGA (embedded field programmable gate array (eFPGA)), enabling the implementation of reconfigurable custom instructions. The tight integration of the eFPGA with SoC scratchpad memory …

View on IEEE Xplore