Transformers

A BEV Perception Transformer Accelerator With Saliency-Driven Image/Point Cloud Fusion and Phase-Linked Dataflow in 28 nm CMOS

A BEV Perception Transformer Accelerator With Saliency-Driven Image/Point Cloud Fusion and Phase-Linked Dataflow in 28 nm CMOS 150 150

Abstract:

Deploying advanced Transformer-based models for real-time, high-accuracy multimodal bird’s-eye-view (BEV) perception in autonomous driving imposes substantial hardware demands. To address this, we propose a low-cost, low-power image/point-cloud fusion Transformer accelerator that supports two modes: high-performance driving and ultra-low-power sentry operation. We first propose a cross-modal saliency evaluation mechanism …

View on IEEE Xplore

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management

MITTA: A Multi-Task Transformer Accelerator With Mixed Precision Structured Sparsity and Hierarchical Task-Adaptive Power Management 150 150

Abstract:

This article presents MITTA, the first silicon-proven transformer accelerator optimized for multi-task inference across both natural language processing (NLP) and image processing domains. MITTA accelerates a task-sharing algorithm that minimizes sub-task computation by reusing both activations and weights from a shared base task, requiring only sparse delta computation for sub-tasks. …

View on IEEE Xplore

Design and Analysis of a 13.7–41 GHz Ultra-Wideband Frequency Doubler With Cross-Coupled Push-Push Structure

Design and Analysis of a 13.7–41 GHz Ultra-Wideband Frequency Doubler With Cross-Coupled Push-Push Structure 150 150

Abstract:

This article presents a 13.7–41 GHz ultra-wideband frequency doubler with high efficiency and conversion gain (CG). The proposed cross-coupled push-push structure, in conjunction with the fourth-order transformer-based resonator and the series gate inductor, collaboratively shapes the input signal amplitude such that three distinct peaks emerge at different frequencies, thereby significantly improving …

View on IEEE Xplore

A K/Ka-Band Transmit/Receive Front-End With Triple-Coupled Transformer Technique in 65-nm Bulk CMOS

A K/Ka-Band Transmit/Receive Front-End With Triple-Coupled Transformer Technique in 65-nm Bulk CMOS 150 150

Abstract:

This article presents a K/Ka-band transmit/receive (T/R) front-end for jointed sensing and communication (JSAC) applications. A reconfigurable matching network for both signal reception and transmission is realized using the proposed triple-coupled transformer (TCT) technique, achieving low power loss and a compact footprint. The T/R switch at …

View on IEEE Xplore

A Compact Reconfigurable Dual-Path Dual-Band LNA for 5G NR FR2 Applications

A Compact Reconfigurable Dual-Path Dual-Band LNA for 5G NR FR2 Applications 150 150

Abstract:

This article presents a reconfigurable dual-path dual-band low noise amplifier (LNA) for fifth generation (5G) millimeter-wave (mmW) communications. A novel band-switching input matching architecture based on the cross-connected transistors is proposed to achieve optimal dual-band input matching and $g_{m}$ -boosting. This architecture allows the dual-band input transistors to share …

View on IEEE Xplore

Analysis and Design of Power Amplifier Using Parallel-Combined Multisegment Transformer

Analysis and Design of Power Amplifier Using Parallel-Combined Multisegment Transformer 150 150

Abstract:

This letter presents a highly efficient power amplifier (PA) using a parallel-combined vertical multisegment transformer for 5G new radio (NR) applications operating in bands n257 and n258, in a 65-nm bulk CMOS process. A multisegment transformer facilitates a lower provided input impedance than a conventional transformer, enabling the PA to …

View on IEEE Xplore

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3

A 3-D HBI Compliant 1.536 TB/s/mm2 Bandwidth Scalable Attention Accelerator With 22.5-GOPS Throughput High Speed SoftMax for Quantized Transformers in Intel 3 150 150

Abstract:

This letter presents a novel hardware accelerator compatible with <3- $\mu $ m pitch 3-D Cu-Cu hybrid bonding interconnect (HBI) technology, particularly designed to efficiently execute multihead attention (MHA) of encoder transformer models. We present an accelerator that addresses performance losses due to low precision models by incorporating specialized hardware optimizations …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity 150 150

Abstract:

Recently, 3-D human motion generation has become essential in media applications such as film production and augmented reality (AR)/virtual reality (VR) devices, requiring the generation of human joint movements and detailed 3-D meshes for each joint. Traditionally, joint creation required hours or even days, making it impractical for real-time …

View on IEEE Xplore