Artificial intelligence understands lips movement

The latest research shows that computers can understand more than 90% of what is said just by observing lips movements without hearing sounds.

The latest research shows that computers can understand more than 90% of what is said just by observing lips movements without hearing sounds.

According to Technology Review page, the ability to read the movements of human lips when speaking is especially difficult because it depends on the context as well as the understanding of natural language spoken, based on the movements of the mouth. when pronouncing

However, the researchers have just demonstrated that the machine learning algorithm can understand the spoken languages ​​in mute (no sound) videos more effective than those who read the signals of the lips.

Picture 1 of Artificial intelligence understands lips movement

Artificial intelligence (AI) and machine learning can understand the meaning of the content of moving human lips, higher accuracy than those who can read lips - (Photo: TechnologyReview).

Specifically, in the first research project, the research team of the University of Oxford Computer Science developed a new artificial intelligence system (AI) called LipNet . This system is built on a data set called GRID, which was created from a series of clips back to the image of how people move their lips when reading 3-second sentences. Each of these sentences is based on a series of words with similar lip-opening patterns.

Accordingly, the team used a data set to "train" a "neural network" , similar to the one commonly used to handle speech recognition.

But in this case, the artificial neural network is responsible for identifying different forms of mouth, learning how to connect that information to the content that explains what is being said.

When tested, this artificial intelligence system was able to determine up to 93.4% of exactly the words that were spoken. Volunteers participating in the lip reading test performed the same task that the machine did and only identified the words with an accurate rate of 52.3%.

Besides this project, the New Scientist site also leads another research project of the research group of the University of Oxford's Department of Engineering Science. In particular, this group conducts similar work but with Google's DeepMind system and performs at a more difficult level.

Instead of using clean and continuous datasets like GRID, they use a series of 100,000 video clips cut from BBC programs. These tapes have a much wider scale of language usage and a variety of speaker's head positions and different lighting environments.

Picture 2 of Artificial intelligence understands lips movement

Google's DeepMind artificial intelligence machine reading technology - (Photo: Yahoo).

Using a similar treatment method, the team created artificial intelligence technology that was able to identify exact words at a rate of 46.8%. It was also much better than humans when only 12.4% of the correct percentage was achieved in this project.

In the second research project, there are clear reasons why the accuracy is lower than the previous project, from the variety of light in the clips to the speaker's diverse poses and degrees. Much more complex of language used.

However, despite the differences, both research projects have shown that artificial intelligence is far superior to humans in terms of lip reading. It is also not difficult for people to imagine potential applications for this technology.

Update 12 December 2018
« PREV
NEXT »
Category

Technology

Life

Discover science

Medicine - Health

Event

Entertainment