Adelia: A 4-nm LLM Processing Unit With Streamlined Dataflow and Dual-Mode Parallelism for Maximizing Hardware Efficiency https://sscs.ieee.org/wp-content/themes/movedo/images/empty/thumbnail.jpg 150 150 https://secure.gravatar.com/avatar/8fcdccb598784519a6037b6f80b02dee03caa773fc8d223c13bfce179d70f915?s=96&d=mm&r=g
Abstract:
The proliferation of large language models (LLMs) as cross-domain foundation models is fueled by aggressive scaling in both parameter counts and inference-time computation. The emergence of sophisticated reasoning models further accelerates this trend, demanding longer context windows and escalating the computational and memory burdens of inference. A fundamental challenge arises …