AI technology helps to reconstruct portraits almost sculpted from just the voice

Scientists at Massachusetts Institute of Technology (MIT-USA) for the first time succeeded in applying algorithms to reconstruct portraits from only voices.

An AI algorithm called Speech2Face, developed by artificial intelligence (AI) scientists at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), helps reconstruct a person's face with just a glance. with a short voice recording and the results were impressive.

Picture 1 of AI technology helps to reconstruct portraits almost sculpted from just the voice
The team's task was to reconstruct an image of a person's face from a short voice recording.

First, the researchers designed and trained a deep learning artificial neural network, using millions of videos from YouTube and the internet of people talking.

During this training, the AI learned the correlation between the sound of the voice and what the speaker looks like. Those correlations allow the AI to make the best guesses about the speaker's age, gender, and nationality.

There is no human involvement in the training process. The AI is only given a large amount of video and is tasked with finding the correlation between speech characteristics and facial features.

Once trained, the AI got very good at creating portraits based solely on voice recordings that resembled what the real speaker would look like.

Picture 2 of AI technology helps to reconstruct portraits almost sculpted from just the voice
Actual image of the speaker (left) and the image reconstructed by AI from their voice (right).

To further analyze the accuracy of the face reconstruction, the researchers built a "face decoder". The decoder will generate a standard reconstruction of a person's face from their still image, ignoring "irrelevant variations", such as pose and lighting. This allows scientists to easily compare speech reconstructions with the actual features of speakers.

Again, the results of the AI are very close to real faces in many of the cases studied from many different ages, genders and ethnicities.

Picture 3 of AI technology helps to reconstruct portraits almost sculpted from just the voice
Actual image of the speaker (left), image reconstructed by AI from their photo (center) and image reconstructed by AI from their voice (right).

Voice-rendering AI can create an animated image of a person on a phone or video conference call when the person's identity is unknown and they don't want to share their real face .

The researchers wrote in the paper published at a conference on computer vision and pattern recognition (CVPR): "Reconstructed faces can also be directly used to assign speech produced by machine-generated used in home appliances and virtual assistants".

Law enforcement can also use AI to create a portrait of a suspect from the sole evidence of a voice recording. However, government apps are bound to be the subject of a lot of controversies and debates regarding privacy and ethics.

Picture 4 of AI technology helps to reconstruct portraits almost sculpted from just the voice
AI creates portraits from just voices. (Photo: Speech2Face Research Team)

Update 08 April 2022