Abstract:
Deploying advanced Transformer-based models for real-time, high-accuracy multimodal bird’s-eye-view (BEV) perception in autonomous driving imposes substantial hardware demands. To address this, we propose a low-cost, low-power image/point-cloud fusion Transformer accelerator that supports two modes: high-performance driving and ultra-low-power sentry operation. We first propose a cross-modal saliency evaluation mechanism …