Danger from AI: Creating accurate faces with just voice

Smartphones can only turn voice into text, however, AI (artificial intelligence technology) is even capable of turning voices into faces accurately. In fact, many people have expressed concern about AI, saying that this technology has unpredictable consequences for humanity, with the risk of replacing humans in the future.

Picture 1 of Danger from AI: Creating accurate faces with just voice

Photos are made with light, but what if portraits of people could be made with the sound of their voices?

Specifically, artificial intelligence scientists at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) first announced an AI algorithm called Speech2Face in a paper published in the journal Science. 2019 and continue to improve until now.

The researchers first designed and trained a deep neural network using millions of videos of people talking from YouTube and the Internet. During this training, the AI learned the correlation between the sound of the voice and the appearance of the speaker. These correlations allow it to make the best guess about the speaker's age, gender, and ethnicity.

Humans are not directly involved in the training process, as researchers do not need to manually classify any data. The AI is simply fed a large amount of video and is tasked with finding correlations between speech and facial features.

Once trained, the AI can be very good at creating lifelike portraits based on voice recordings alone. Also, this AI works better when the recordings are longer.

To further analyze the accuracy of the face reconstruction, the researchers built a 'face decoder' that generates a reference from the original face, ignoring the extraneous stuff. such as posture and light. This will make it easier for scientists to compare the image generated from the voice with the image of the speaker's face.

The results of the AI are very close to the real face in most cases. However, there are some cases where the AI has trouble visualizing what a speaker looks like. Factors such as accent, language, and voice pitch are the factors that cause voice-to-face mismatches, where gender, age, or ethnicity are incorrect.

People with high voices (including boys) are often identified as female while those with low voices are considered male. An Asian male who speaks English results in a less Asian appearance than if he spoke Chinese.

According to the researchers, they had privacy and ethical considerations surrounding the project. All actual plans to use this technology need to be carefully examined.

Law enforcement could use AI to portray a suspect if the only evidence is a voice recording. However, this can cause a lot of controversies regarding the ethics and privacy of individuals.

Furthermore, this technology can have a negative impact on YouTube creators who are trying to protect their private lives by just dubbing and not appearing in front of the camera. camera.

While the fact that an AI can create accurate portraits of people just from their voices is considered something fascinating and something that seems like only in science fiction, it is not. researchers' goals. They said that the study was done to provide a more comprehensive view of the correlation between face and voice, which could help open up new research and application opportunities.

Update 11 April 2022