Researchers From Facebook AI And The University Of Texas At Austin Introduce VisualVoice: A New Audio-Visual Speech Separation Approach

  • by

Despite being present in surroundings with contaminated and overlapping sounds, the human perceptual system moves massively on visual information to lessen the audio’s ambiguities and modulate concentration on an active speaker in a dynamic environment.

Researchers at Facebook AI and the University of Texas at Austin have proposed a new audio-visual speech separation approach. VisualVoice is a new multi-task learning framework that jointly learns audio-visual speech separation and cross-modal speaker embeddings. It efficiently uses a person’s facial appearance to predict their vocal sounds.




submitted by /u/ai-lover
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *