Teach cooking robots with Youtube

When it comes to learning how to cook, it seems like today's robots are not too different from humans?

The process of teaching robots to learn a certain action or event has a huge difference to teaching a person. People can understand the saying "I need a cup" and just take a cup and bring it to the other person. However, the process of teaching robots is not so simple. You will have to teach carefully that he must turn around, go to the cupboard, open the cupboard, take the cup, close the cupboard, turn around and walk towards the person who ordered and give the cup to them.

Picture 1 of Teach cooking robots with Youtube

And another important point in teaching robots is how you can program a robot to visually distinguish that you need a plastic cup, glass or simply is a cup needed? How can you design a robot capable of teaching yourself?

Researchers at the University of Maryland Institute of Advanced Computer Research (UMIACS) have found a way. That's thanks to YouTube. More specifically, cooking tutorials shared on Youtube. By watching these instructional videos, the robot can learn a series of complex movements needed to cook. These robots will observe how people do on videos and then imitate them.

Picture 2 of Teach cooking robots with Youtube

"The reason we choose cooking instructional videos is that anyone can do it" - the professor of computer science at UMD and the manager of " Computer Vision " laboratory of Computer UMIACS is Yiannis Aloimonos shared.

However, cooking is a series of complex cooking actions, sequential coordination and with many related tools. For example, if you want to cut a cucumber, you will have to take the knife and put it in place before proceeding to cut the cucumber. In the process you will always have to watch to make sure you are "doing the right thing".

The robot in the experiment of Yiannis Aloimonos uses several important systems to be able to learn from YouTube videos. Firstly, the "computer vision" system. This is a system for collecting and processing digital photos, analyzing and identifying images and multidimensional data from outside to give digital information. After the "electronic eyes" have collected images from the video, an "electronic brain" with artificial intelligence will be used to analyze the images. Finally, the language parsing system will help the robot understand the instructions. After synthesizing the analysis from these 3 systems, the robot will turn it into action.

Picture 3 of Teach cooking robots with Youtube
Cornelia Fermüller is conducting experiments with "computer vision"

In this way, the robot can collect individual cooking steps from different videos and assign them to individually mark their program and then put them together in the correct order.

Cornelia Fermüller - Partner of Yiannis Aloimonos and UMIACS scientific researcher - said: "We are trying to create a robot technology that can eventually interact with people. To do that. We need a tool for robots to see and mimic human actions. "

There is a difference of this research group with previous projects that focus on work goals and not steps. Robots will learn how to store different actions into its database and use them to complete a task rather than just mimicking the step by step step by step human actions .

Picture 4 of Teach cooking robots with Youtube
Describe the performance of the robots

Also according to the research of the team, the group predicted action identification modules with an accuracy of 77% and coverage of 76%. As for the module that identifies foods, the accuracy of these robots is 93% and the coverage is 93%. Accuracy and coverage are two common terms in classification techniques. Precision is measured by the ratio of the correct object to the total number of objects received, and the Recall is measured by the ratio of the object returned exactly on the total of related objects. .

Picture 5 of Teach cooking robots with Youtube
Different types of handling actions

According to practical experiments, these robots generally only achieved a rate of food recognition of 73%, an action capture rate of 93% and an anticipated action of 83%. The significant drop in object identification accuracy is because the robots have not trained on some objects, such as tofu.

"By creating flexible robots, we will contribute to the next phase of automation. This will be the next industrial revolution" - Aloimonos said. "We will have a fully automated production and intelligent warehouse environment. This would be great for using self-control robots for dangerous work like removing bombs or cleaning nuclear disasters like events. Fukushima We will prove that it is possible for humanoid robots to do human work in the future. "

The team will be presenting their research at the Artificial Intelligence Development Conference in Austin, Texas on January 29, 2015.