AI Genie - Google's answer to Sora: AI creates its own 2D game with a single image prompt!

Generative AI is developing at a breakneck speed compared to users' imagination, gradually moving towards blurring the line between reality and imagination. A few days ago, OpenAI just introduced a new AI model called Sora , which helps create short videos from text. And now comes the answer of rival Google when it published a research document on an AI model called Genie - capable of creating 2D video games from text prompts and a single image.

However, currently Genie is still being developed in the research lab and has not been announced on the market.

Developed by Google DeepMind's Open-Endedness team , this groundbreaking research project promises huge potential for the future of entertainment, game development and even robotics. Google says Genie is a "controllable world model" trained on a massive dataset of 200,000 hours of unlabeled video, mostly 2D platform games that appear on the internet.

Unlike traditional AI models that often require detailed instructions and labeled data, Genie learns by observing actions and interactions in these videos, allowing it to create a 2D game from a Simple description in text or image.

It seems like some kind of miracle, but as explained in the research document about Genie published by Google DeepMind, the inner working process is relatively complicated:

Genie includes three core components:

Video Tokenizer: Imagine Genie as a skilled chef preparing a complex dish. Just like how a chef separates ingredients into smaller parts for easy manipulation, Video Tokenization processes huge blocks of video data into compact units called "tokens." These tokens serve as basic building blocks that help Genie understand the visual world.

Picture 1 of AI Genie - Google's answer to Sora: AI creates its own 2D game with a single image prompt! Picture 2 of AI Genie - Google's answer to Sora: AI creates its own 2D game with a single image prompt!
With just a single static image, AI Genie can create a simple 2D game like the one above.

Latent Action Model: In the second step, after "chopping" the video data into tokens, the Latent Action Model will take over the next process. Like a seasoned culinary expert, it meticulously analyzes the transitions between consecutive frames in the video. This analysis allows it to identify eight basic actions - the essential "spice" of Genie. These actions can include jumping, running, interacting with objects in the game environment.

Picture 3 of AI Genie - Google's answer to Sora: AI creates its own 2D game with a single image prompt! Picture 4 of AI Genie - Google's answer to Sora: AI creates its own 2D game with a single image prompt!
However, the image quality is still very rudimentary and the game content is quite simple.

Dynamics Model: Finally comes the process of Dynamics Modeling – the process of putting everything together. Similar to how a chef predicts flavor interactions based on selected ingredients, this model predicts the next frame in a video sequence. It takes into account the current state of the game world, including the player's actions, and generates the next visual output accordingly. This continuous process of prediction ultimately creates an interactive and engaging gaming experience.

However, currently, Genie is still in the process of development and still has many limitations. For example, the display quality is still very poor when the new frame rate is only 1FPS, affecting image fidelity.

Even so, Genie's potential also makes many people question the future of jobs related to game development, especially positions with low roles in game development activities. The same thing is happening in the field of filmmaking when recently, a billionaire in Hollywood also said he used AI to edit his face in movies, instead of needing makeup experts to do makeup for him. figure.