inference

HyFPCiM: A 65-nm 417-μW Error-Sensitivity-Aware FP8 Compute-in-Memory Macro

HyFPCiM: A 65-nm 417-μW Error-Sensitivity-Aware FP8 Compute-in-Memory Macro 150 150

Abstract:

This letter presents HyFPCiM, a 65-nm FP8 compute-in-memory (CiM) macro that enables sub-mW floating-point (FP) inference using error-sensitivity-aware FP partitioning (EAP). EAP maps exponent processing to a digital CiM (DCiM) path and mantissa accumulation to an analog CiM (ACiM), avoiding the power- and area-intensive adder-tree-based accumulation used in prior FP-CiM …

View on IEEE Xplore

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency

Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency 150 150

Abstract:

The proliferation of large language models (LLMs) as cross-domain foundation models is fueled by aggressive scaling in both parameter counts and inference-time computation. The emergence of sophisticated reasoning models further accelerates this trend, demanding longer context windows and escalating the computational and memory burdens of inference. A fundamental challenge arises …

View on IEEE Xplore

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement

Energy-Efficient Reconfigurable XGBoost Inference Accelerator With Modular Unit Trees via Selective Node Execution and Data Movement 150 150

Abstract:

The extreme gradient boosting (XGBoost) has emerged as a powerful AI algorithm, achieving high accuracy and winning multiple Kaggle competitions in various tasks including medical diagnosis, recommendation systems, and autonomous driving. It has great potential for running on edge devices due to its binary tree-based simple computing kernel, offering unique …

View on IEEE Xplore

MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator

MINOTAUR: A Posit-Based 0.42–0.50-TOPS/W Edge Transformer Inference and Training Accelerator 150 150

Abstract:

Transformer models have revolutionized natural language processing (NLP) and enabled many new applications, but are challenging to deploy on resource-constrained edge devices due to their high computation and memory demands. We present MINOTAUR, an edge system-on-chip (SoC) for inference and fine-tuning of Transformer models with all memory on the chip. …

View on IEEE Xplore