Computational modeling

Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data

Chameleon: A Multiplier-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data 150 150

Abstract:

On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power (LP) end-to-end on-chip learning, especially from real-world sequential data with a limited number of examples, is an open challenge. Indeed, accelerators supporting error backpropagation optimize for learning performance at …

View on IEEE Xplore

Coupled Simulation Methodology for In-Memory Computing Systems

Coupled Simulation Methodology for In-Memory Computing Systems 150 150

Abstract:

Simulations for the development and optimisation of future in-memory computing systems often face the problem that the modelling of the large system is desired, but at the same time the effects at the device level, should also be taken into account. Such effects could be due to the material properties …

View on IEEE Xplore

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity

A 57.3-fps 12.8 TFLOPS/W Text-to-Motion Processor With Inter-Iteration Output Sparsity and Inter-Frame Joint Similarity 150 150

Abstract:

Recently, 3-D human motion generation has become essential in media applications such as film production and augmented reality (AR)/virtual reality (VR) devices, requiring the generation of human joint movements and detailed 3-D meshes for each joint. Traditionally, joint creation required hours or even days, making it impractical for real-time …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration

MEGA.mini: An Energy-Efficient NPU Leveraging a Novel Big/Little Core With Hybrid Input Activation for Generative AI Acceleration 150 150

Abstract:

This article presents a processor for the acceleration of generative AI (GenAI) based on a novel heterogeneous core architecture called MEGA.mini. The processor introduces three algorithmic features: 1) fixed-point (FXP) and floating-point (FP) hybrid input activation (IA) representation; 2) a delayed-statistics-based normalization (NORM); and 3) conditional polynomial-based nonlinear activation (NLA) approximation. These …

View on IEEE Xplore

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing

Space-Mate: A 303.5-mW Real-Time Sparse Mixture-of-Experts-Based NeRF-SLAM Processor for Mobile Spatial Computing 150 150

Abstract:

Simultaneous localization and mapping (SLAM) provides crucial ego-pose information and 3-D maps of the user environment, which are fundamental to emerging mobile spatial computing devices. Dense 3-D mapping and accurate pose estimation are particularly necessary for applications like augmented reality (AR) and autonomous navigation. However, existing SLAM processors are typically …

View on IEEE Xplore

A SPICE-Compatible Compact Model of Ferroelectric Diode

A SPICE-Compatible Compact Model of Ferroelectric Diode 150 150

Abstract:

In this work, for the first time, we present a SPICE-compatible compact model of ferroelectric (FE) diodes to enable their design exploration for diverse applications, including memory and unconventional computing paradigms. We propose modified Schottky barrier and hopping models for capturing the on- and off-mode operations of the FE diode, …

View on IEEE Xplore

Integrating Atomistic Insights With Circuit Simulations via Transformer-Driven Symbolic Regression

Integrating Atomistic Insights With Circuit Simulations via Transformer-Driven Symbolic Regression 150 150

Abstract:

This article introduces a framework that establishes a cohesive link between the first principles-based simulations and circuit-level analyses using a machine learning-based compact modeling platform. Starting with atomistic simulations, the framework examines the microscopic details of material behavior, forming the foundation for later stages. The generated datasets, with molecular insights, …

View on IEEE Xplore