At the end of last year, Google DeepMind announced a 3D scene that can be interacted with by mouse and keyboard, and can be generated with only a single image.Artificial intelligence model Genie 2Later, it was launched again earlierAn updated version called Genie 3, based on Genie 2, further enhances the interactivity and continuous stability of the simulation environment, and adds a new feature called "Promptable World Events" that can instantly change the scene content through text commands. It is expected to make the AI model training process more flexible and closer to actual needs.
The Genie series is a "world model" that operates similarly to building immersive simulated environments, allowing AI systems to interact and learn within these virtual worlds, thereby training their abilities to cope with real-world scenarios. Since the launch of the first version, Genie 2023, in 1, Google DeepMind has continued to advance its application potential through generative models. Genie 2, launched late last year, debuted support for 3D environments and scene memory, preserving the state of the world after the user leaves a certain area, significantly improving the consistency of the simulated environment.
While the Genie 3 doesn't represent a generational leap like its predecessor, Google DeepMind research director Shlomi Fruchter and scientist Jack Parker-Holder said the upgrade is crucial for the long-term development of general artificial intelligence (AGI).
Genie 3 increases output quality from 360p to 720p, making the overall generated image clearer and significantly improving simulation stability. While Genie 2 could theoretically simulate for 60 seconds, in practice, errors and screen corruption often began to appear within tens of seconds. Genie 3 now generates content that can run stably for several minutes, further extending the effective duration of AI training.
Genie 3 also introduces a "Promptable World Events" feature, allowing users to instantly change scene content through text prompts. For example, in a demonstration, the Google DeepMind development team issued the command "Join a herd of deer" in a simulated skiing scene, and the system immediately generated a herd of deer on screen, demonstrating Genie 3's ability to understand semantics and its potential for dynamic interaction.
Google DeepMind emphasizes that this capability is crucial for training responsive AI systems, such as self-driving cars and robots. For example, the system can simulate unexpected situations, such as pedestrians crossing the road, allowing AI models to learn how to respond immediately and compensate for rare scenarios that are difficult to capture in real-world data.
However, the research team also pointed out that Genie 3 still has many limitations at this stage, such as its inability to accurately reproduce real-world scenery, its inability to fully display text content, and its insufficient simulation duration. To become a truly valuable training platform, future versions will need to support stable simulations lasting several hours.
Genie 3 is not yet publicly available, having been initially offered to a limited number of beta testers. Google DeepMind plans to expand its accessibility in the future, continuously fine-tuning the simulation's content and interactive features, and moving towards broader AI applications. Jack Parker-Holder stated, "This won't be the only training environment, but it will help us identify behaviors that AI shouldn't engage in, which is important in itself."








