[Collaborative Research]
Enhancing Speech-Driven 3D Facial Animation with Audio
-Visual Guidance from Lip Reading Expert
Han EunGi, Oh Hyun-Bin, Kim Sung-Bin, Corentin Nivelet Etcheberry, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
Abstract
Speech-driven 3D facial animation has recently garnered atten
tion due to its cost-effective usability in multimedia production.
However, most current advances overlook the intelligibility of
lip movements, limiting the realism of facial expressions. In this
paper, we introduce a method for speech-driven 3D facial ani
mation to generate accurate lip movements, proposing an audio
visual multimodal perceptual loss. This loss provides guidance
to train the speech-driven 3D facial animators to generate plau
sible lip motions aligned with the spoken transcripts. Further
more, to incorporate proposed audio-visual perceptual loss, we
devise an audio-visual lip reading expert leveraging its prior
knowledge about correlations between speech and lip motions.
We validate the effectiveness of our approach through broad
experiments, showing noticeable improvements in lip synchro
nization and lip readability performance.
Index Terms: Speech-driven 3D Facial Animation, Audio
Visual Speech Recognition, Multimodal Perceptual Loss