MIT scientists develop ball-catching robot dog

Scientists at MIT have developed a robot dog that can play soccer using a combination of artificial intelligence (AI) and computer vision.


Robot dog plays soccer. (Video: Live Science).

Researchers at the Massachusetts Institute of Technology (MIT) have developed a method called "Clio" that allows robots to quickly map a scene using body-mounted cameras and identify the parts most relevant to the task they are assigned through voice instructions. The research was published in the journal IEEE Robotics and Automation Letters on October 10.

Clio exploits the theory of 'information bottlenecks,' whereby information is compressed so that a neural network, a set of machine learning algorithms layered to mimic the way the human brain processes information, selects and stores only the relevant segments. Any robot equipped with this system will process instructions selectively, focusing on its task and ignoring everything else.

For example, suppose there is a stack of books in the scene and the task is to pick up only the green book. In that case, all the information about the scene is pushed and ends up with a cluster of segments representing the green book, says study co-author Dominic Maggio, a graduate student at MIT. ' All the other unrelated segments are grouped into a cluster that can be easily discarded.'

To demonstrate Clio in action, the team used Boston Dynamics' Spot quadruped robot running Clio to explore an office building and perform a series of tasks. Working in real time, Clio created a virtual map that showed only objects relevant to its task, which then allowed the Spot robot to complete its goals.

Picture 1 of MIT scientists develop ball-catching robot dog
Robot dog. (Photo: Andy Ryan).

Robots can also see, understand, and follow. Researchers achieved this level of detail with Clio by combining large language models (LLMs) — the many virtual neural networks that underpin AI tools, systems, and services — that have been trained to identify all kinds of objects, with computer vision. The breakthrough that Clio brings is the ability to be detailed with what it sees in real time, relevant to the specific tasks it is given.

A core part of this is incorporating a mapping tool into Clio that allows it to divide a scene into small segments. A neural network then picks out segments that are semantically similar – meaning they serve the same purpose or form similar objects.

In the future, the team plans to adapt Clio to handle higher-level tasks. 'We're still giving Clio specific tasks, like 'find a deck of cards,'' Maggio says. 'For search and rescue, you need to give it higher-level tasks, like 'find survivors' or 'restore power .' So we want to get a more human-level understanding of how to complete more complex tasks. '