Google DeepMind has unveiled the new generation of its AI models for robotics: Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. It’s a significant expansion of the solutions from March 2024, giving robots the ability to perform multi-stage planning and use the internet to supplement data. Thanks to the new models, machines can plan actions several steps ahead, allowing them to tackle complex tasks in the physical world.
How does it work? Two-stage architecture
The heart of the system is the collaboration of two specialized models. The first one, Gemini Robotics-ER 1.5, serves as a component for analysis and planning. It processes data about the environment and the goal, uses digital tools to find necessary information (like local regulations), and then generates an action plan in natural language. These instructions are sent to the second model, Gemini Robotics 1.5, which is a VLA-type (vision-language-action) execution system that translates the received plan into specific, physical robot operations. This architecture signifies a fundamental change: moving from executing simple, single commands to autonomously carrying out multi-step tasks.