[Collaborative Research]

Enhancing Speech-Driven 3D Facial Animation with Audio
-Visual Guidance from Lip Reading Expert

Han EunGi, Oh Hyun-Bin, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh

Abstract

Speech-driven 3D facial animation has recently garnered atten tion due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial expressions. In this paper, we introduce a method for speech-driven 3D facial ani mation to generate accurate lip movements, proposing an audio visual multimodal perceptual loss. This loss provides guidance to train the speech-driven 3D facial animators to generate plau sible lip motions aligned with the spoken transcripts. Further more, to incorporate proposed audio-visual perceptual loss, we devise an audio-visual lip reading expert leveraging its prior knowledge about correlations between speech and lip motions. We validate the effectiveness of our approach through broad experiments, showing noticeable improvements in lip synchro nization and lip readability performance. Index Terms: Speech-driven 3D Facial Animation, Audio Visual Speech Recognition, Multimodal Perceptual Loss
Interspeech 2024