Convolutional neural networks

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs 150 150

Abstract:

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices …

View on IEEE Xplore

A 7.5-μW 35-Keyword End-to-End Keyword Spotting System With Random Augmented On-Chip Training

A 7.5-μW 35-Keyword End-to-End Keyword Spotting System With Random Augmented On-Chip Training 150 150

Abstract:

Fully integrated keyword spotting (KWS) systems designed for low-power operation face two major challenges. First, increasing the number of supported keywords significantly raises system complexity and power consumption. Second, most existing systems are not personalized to individual users, as they are trained on data from native English speakers, leading to …

View on IEEE Xplore

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow

SparseCol: A 1320 BTOPS/W Precision-Scalable NPU Exploiting Training-Free Structured Bit-Level Sparsity and Dynamic Dataflow 150 150

Abstract:

Bit-serial computation enables sequential processing of data at the bit level, providing several advantages, such as scalable computational precision. This approach has gained significant attention, especially for exploiting bit-level sparsity (BLS) in AI workloads. While current bit-serial processors leverage BLS to eliminate the computation associated with zero bits, they face …

View on IEEE Xplore

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET

An Energy-Efficient CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos in 16-nm FinFET 150 150

Abstract:

The capability to detect small objects precisely in real time is essential for intelligent systems, particularly in advanced driver assistance systems (ADASs), as it ensures continuous awareness of distant obstacles for enhanced safety. However, achieving high detection precision for small objects requires high-resolution input inference on deep convolutional neural network (…

View on IEEE Xplore

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore