Vectors

Antiferromagnetic Programmable Neuron: Structure, Training, and Pattern Recognition Applications

Antiferromagnetic Programmable Neuron: Structure, Training, and Pattern Recognition Applications 150 150

Abstract:

Artificial neurons based on antiferromagnetic (AFM) spin Hall oscillators (SHOs) are promising elements for creating ultrafast, energy-efficient neuromorphic computing systems. These structures can generate picosecond spikes in response to dc and ac electric currents, thereby mimicking the reaction of biological neurons to an external stimulus. However, conventional AFM neurons have …

View on IEEE Xplore

Birch: A Real-Time Multi-Domain Multi-Task Extended Reality Perception Accelerator

Birch: A Real-Time Multi-Domain Multi-Task Extended Reality Perception Accelerator 150 150

Abstract:

Birch is a system-on-chip (SoC) that efficiently and accurately accelerates the multi-task multi-domain extended reality (XR) perception pipeline, with workloads such as visual inertial odometry (VIO), eye gaze tracking, and scene understanding. Birch features vision modules with cascaded line buffers, in-step feature sorting, and double-buffered optical flow to extract and …

View on IEEE Xplore

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration

EMO-CIM: An Input/Stationary-Data Similarity-Aware Computing-In-Memory Design for Variable Vector-Wise Computation in Edge Multioperator AI Acceleration 150 150

Abstract:

We propose an edge multioperator computing-in-memory (EMO-CIM) design that supports variable vector-wise multiply-and-accumulate (MAC) in CNN, Depthwise (DW)-Convolution, and Attention operators. It features: 1) a single EMO-CIM bank (ECB) excels in variable vector-wise MAC (V-MAC) for multioperators; 2) merging local input-shared compute units (LISCUs) with a decode-unit and adder-tree (DUAT) facilitates …

View on IEEE Xplore

An Electrophysiology-Optogenetics Closed-Loop Bi-Directional Neural Interface for Sleep Regulation With 0.2-μJ/class Multiplexer-Based Neural Network

An Electrophysiology-Optogenetics Closed-Loop Bi-Directional Neural Interface for Sleep Regulation With 0.2-μJ/class Multiplexer-Based Neural Network 150 150

Abstract:

This work proposed a multiplexer-based neural network (MUXnet), a multiplexer-based, multiplier-free neural network (NN) structure applicable to the implementation of all inner product-based NN layers. An on-chip MUXnet-based neural signal processing unit (NSPU) was designed, achieving a state-of-the-art accuracy of 82.4% on a public human sleep staging dataset, with the lowest …

View on IEEE Xplore

Side-Channel Attack-Resistant HMAC-SHA256 Accelerator With Boolean and Arithmetic Masking in Intel 4 CMOS

Side-Channel Attack-Resistant HMAC-SHA256 Accelerator With Boolean and Arithmetic Masking in Intel 4 CMOS 150 150

Abstract:

This work describes a side-channel attack (SCA)-resistant hash-based message authentication code (HMAC) accelerator with secure hash algorithm 2 (SHA-2) using Boolean and arithmetic masking along with the first-reported ASIC implementation in Intel 4 CMOS with 10 M measured traces. Previously reported masked datapath suffers from high area/performance overheads (>100%) designs due to …

View on IEEE Xplore

Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data

Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data 150 150

Abstract:

On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power (LP) end-to-end on-chip learning, especially from real-world sequential data with a limited number of examples, is an open challenge. Indeed, accelerators supporting error backpropagation optimize for learning performance at …

View on IEEE Xplore

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC

A 28-nm FeFET Compute-in-Memory Macro With 64×64 Array Size and On-Chip 4-Bit Flash ADC 150 150

Abstract:

Compute-in-memory (CIM) using emerging nonvolatile memory devices is a promising candidate for energy-efficient deep neural network (DNN) inference at the edge. Ferroelectric field-effect transistors (FeFETs) have recently gained attention as nonvolatile, CMOS-compatible devices with a higher on/off ratio and lower read and write energy compared to resistive random-access memory (…

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications

A 28-nm Computing-in-Memory Processor With Zig-Zag Backbone-Systolic CIM and Block-/Self-Gating CAM for NN/Recommendation Applications 150 150

Abstract:

Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation system (RecSys). However, several challenges still exist. First, a large gap between the macro and system-level CIM energy efficiency is observed. Second, several memory-dominate operations, such as embedding in …

View on IEEE Xplore