Transformers

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity 150 150

Abstract:

Recently, 3-D human motion generation has become essential in media applications such as film production and augmented reality (AR)/virtual reality (VR) devices, requiring the generation of human joint movements and detailed 3-D meshes for each joint. Traditionally, joint creation required hours or even days, making it impractical for real-time …

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core

DPIM: A 2T1C eDRAM Transformer-in-Memory Chip With Sparsity-Aware Quantization and Heterogeneous Dense–Sparse Core 150 150

Abstract:

Transformer models have revolutionized artificial intelligence (AI) applications across various domains, but their increasing complexity poses significant challenges in terms of computational and memory demands. While processing-in-memory (PIM) paradigms have been adopted to address these limitations, existing PIM-based transformer accelerators still face hurdles such as: 1) focusing solely on optimizing attention …

View on IEEE Xplore

A Fully Integrated Galvanic Isolator for Gate Drivers With Asynchronous 100/167 Mb/s ASK/FSK Full-Duplex Communication

A Fully Integrated Galvanic Isolator for Gate Drivers With Asynchronous 100/167 Mb/s ASK/FSK Full-Duplex Communication 150 150

Abstract:

A fully integrated galvanic isolator for gate drivers that supports high-speed, asynchronous, full-duplex communication is presented. Data transmission from the microcontroller to the power device is achieved using amplitude-shift keying (ASK) at 100 Mb/s, while simultaneous communication in the opposite direction is implemented using frequency-shift keying (FSK) at 167 Mb/s. …

View on IEEE Xplore

SLiMDO: A Single-Link Multi-Domain-Output Isolated DC–DC Converter With Passive Magnetic Flux Sharing for Local Energy Distribution and Rx Behavior Sensing-Based Global Power Modulation

SLiMDO: A Single-Link Multi-Domain-Output Isolated DC–DC Converter With Passive Magnetic Flux Sharing for Local Energy Distribution and Rx Behavior Sensing-Based Global Power Modulation 150 150

Abstract:

This article introduces a small form-factor single-link multi-domain-output (SLiMDO) isolated dc–dc converter design. The proposed design provides two regulated outputs in separate domains in the receiver (Rx) side with a single micro-transformer. These isolated Rxes achieve local voltage regulation and automatic energy distribution across domains through passive magnetic flux …

View on IEEE Xplore

A Millimeter-Wave Standing-Wave Oscillator With Frequency-Specific Wave-Velocity Control Demonstrating Class-F Effects

A Millimeter-Wave Standing-Wave Oscillator With Frequency-Specific Wave-Velocity Control Demonstrating Class-F Effects 150 150

Abstract:

Implementing dual-resonance class-F oscillators with transformer feedback beyond 60 GHz poses significant challenges due to the limited third-harmonic tank impedance when using small coils with low coupling factors. To address these limitations and leverage the phase noise advantages of class-F operation, this letter introduces a standing-wave oscillator (SWO) topology featuring an …

View on IEEE Xplore

A Simultaneous Dual-Carrier Transformer-Coupled Passive Mixer-First Receiver Front-End Supporting Blocker Suppression

A Simultaneous Dual-Carrier Transformer-Coupled Passive Mixer-First Receiver Front-End Supporting Blocker Suppression 150 150

Abstract:

A high dynamic range N-path passive mixer-first receiver architecture capable of simultaneously down-converting two arbitrary bands through a single RF port is presented. The architecture consists of two passive mixers arranged in a series configuration with a transformer front-end to minimize cross-loading between mixers while still providing impedance transparency and …

View on IEEE Xplore

MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator

MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator 150 150

Abstract:

Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. …

View on IEEE Xplore