Pipelines

A 27.5–28.5 mJ/Frame 3-D Gaussian Rendering Processor With Spherical Beta Illumination and Mixed-Precision Computation Path

A 27.5–28.5 mJ/Frame 3-D Gaussian Rendering Processor With Spherical Beta Illumination and Mixed-Precision Computation Path 150 150

Abstract:

This letter presents a 3-D Gaussian rendering processor that integrates a spherical beta (SB) illumination module with a mixed-precision rendering engine to enable energy-efficient novel-view synthesis on edge devices. SB replaces spherical harmonics (SH) with a hardware-efficient kernel implemented using a pipelined fixed-point piecewise linear (PWL) power unit. The pipeline …

View on IEEE Xplore

Birch: A Real-Time Multi-Domain Multi-Task Extended Reality Perception Accelerator

Birch: A Real-Time Multi-Domain Multi-Task Extended Reality Perception Accelerator 150 150

Abstract:

Birch is a system-on-chip (SoC) that efficiently and accurately accelerates the multi-task multi-domain extended reality (XR) perception pipeline, with workloads such as visual inertial odometry (VIO), eye gaze tracking, and scene understanding. Birch features vision modules with cascaded line buffers, in-step feature sorting, and double-buffered optical flow to extract and …

View on IEEE Xplore

A 39.4-mW 300 MHz-BW 70.9 dB-SNDR Hybrid ADC With Resistive Input and 200 fs, rms-Jitter Tolerance

A 39.4-mW 300 MHz-BW 70.9 dB-SNDR Hybrid ADC With Resistive Input and 200 fs, rms-Jitter Tolerance 150 150

Abstract:

This letter presents a power-efficient hybrid ADC architecture: a low-resolution continuous-time (CT) delta-sigma modulator (DSM) followed by a time-interleaved pipeline stage which further quantizes the quantization noise of the DSM. In the frontend CT DSM, the resistive input makes the ADC easy-to-drive, and the direct-charge-dump feedback (DCD FB) provides a …

View on IEEE Xplore

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle

A 3 nm FinFET 125 TOPS/W-29 TFLOPS/W, 90 TOPS/mm2-17 TFLOPS/mm2 SRAM-Based INT8, and FP16 Digital-CIM Compiler With Support for Multi-Weight Update/Cycle 150 150

Abstract:

This article presents an static random-access memory (SRAM)-based digital compute-in-memory (CIM) compiler implemented with 3 nm high- $\kappa $ metal gate (HKMG) FinFET technology, supporting flexible INT8 and FP16 formats for weight and activation multiply-accumulate (MAC) operations, offering configuration flexibility, high accuracy, and improved area and power efficiency. The FP16 digital …

View on IEEE Xplore

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications

Opal: A 16-nm Coarse-Grained Reconfigurable Array SoC for Full Sparse Machine Learning Applications 150 150

Abstract:

Sparsity has recently attracted increased attention in the machine learning (ML) community due to its potential to improve performance and energy efficiency by eliminating ineffectual computations. As ML models evolve rapidly, reconfigurable architectures, such as coarse-grained reconfigurable arrays (CGRAs), are being explored to adapt to and accelerate emerging models. Previous …

View on IEEE Xplore