Computer architecture

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models 150 150

Abstract:

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and …

View on IEEE Xplore

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI

PANNA: A 558 TOPS/W Pipelined All-Analog Neural Network Accelerator in 22 nm FD-SOI 150 150

Abstract:

Analog computing offers intrinsic energy and latency benefits that makes it attractive for real-time and edge applications. Conventional analog accelerators suffer from repeated conversions between analog and digital domain, which degrades efficiency and throughput. We propose an all-analog pipelined neural network accelerator architecture in 22 nm fully-depleted silicon-on-insulator (FD-SOI) complementary metal-oxide-semiconductor (…

View on IEEE Xplore

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm

AACIM: A 2785-TOPS/W, 161-TOP/mm2, <1.17%-RMSE, Analog-In Analog-Out Computing-In-Memory Macro in 28 nm 150 150

Abstract:

This article presents an analog-in analog-out CIM macro (AACIM) for use in analog deep neural network (DNN) processors. Our macro receives analog inputs, performs a 64-by-32 vector–matrix multiplication (VMM) with a current-discharging computation mechanism, and produces analog outputs. It stores a 4-bit weight as an analog voltage in the …

View on IEEE Xplore

Impact of Aging, Self-Heating, and Parasitics Effects on NSFET and CFET

Impact of Aging, Self-Heating, and Parasitics Effects on NSFET and CFET 150 150

Abstract:

This work presents a comparative analysis of complementary field-effect transistor (CFET) and nanosheet FET (NSFET) architectures, with a focus on self-heating effects (SHEs), negative bias temperature instability (NBTI), hot carrier degradation (HCD), and the impact of back-end-of-line (BEOL) parasitics on standard cell performance. NBTI degradation is modeled using a framework …

View on IEEE Xplore

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device 150 150

Abstract:

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first …

View on IEEE Xplore

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core

A 168 nW to 44.3 Mb/s Adaptable TRNG With 400 mV Attack-Resilient Hybrid RO Core 150 150

Abstract:

This letter presents an adaptable ring oscillator (RO)-true random number generator (TRNG) that removes the fixed power–throughput tradeoff by selecting delay-cell physics at run time. A hybrid core uses a current-starved inverter in low-power (LP) mode to amplify slew-limited jitter for high bit-efficiency at low frequency, and a …

View on IEEE Xplore

Understanding Reliability Trade-Offs in 1T-nC and 2T-nC FeRAM Designs

Understanding Reliability Trade-Offs in 1T-nC and 2T-nC FeRAM Designs 150 150

Abstract:

Ferroelectric random access memory (FeRAM) is a promising candidate for energy-efficient nonvolatile memory, particularly for logic-in-memory and compute-in-memory (CIM) applications. Among the available cell architectures, One-Transistor–n-Capacitor (1T-nC) and two-transistor–n-capacitor (2T-nC) FeRAMs each offer distinct trade-offs in density, scalability, and reliability. In this work, we present a comparative study …

View on IEEE Xplore

A 28-Gb/mm2 4XX-Layer 1-Tb 3-b/Cell WF-Bonding 3D-nand Flash With 5.6-Gb/s/Pin IOs

A 28-Gb/mm2 4XX-Layer 1-Tb 3-b/Cell WF-Bonding 3D-nand Flash With 5.6-Gb/s/Pin IOs 150 150

Abstract:

The challenge of evolving to create a memory that is shrinking compared to the previous generation while satisfying the high performance and low power required for flash memory has been present in every generation, but the recent rapid change to artificial intelligence (AI) trends is very tough, as the level …

View on IEEE Xplore

Beyond Backside Power: Backside Signal Routing as Technology Booster for Standard-Cell Scaling

Beyond Backside Power: Backside Signal Routing as Technology Booster for Standard-Cell Scaling 150 150

Abstract:

Advances in process technology enabling backside metals (BSMs) and contacts offer new design–technology co-optimization (DTCO) opportunities to further enhance power, performance, and area gains (PPA) in sub-3-nm nodes. This work exploits backside (BS) contact technology within standard cells to extend both signal and clock routing to BSM layers, …

View on IEEE Xplore