An update that teaches robots to plan and use the internet

Google DeepMind has unveiled models that combine reasoning in the physical world with online search and skill transfer between robots. The goal is to achieve multi-step tasks and fewer scripts.

Loading the Elevenlabs Text to Speech AudioNative Player…

Google DeepMind has unveiled the new generation of its AI models for robotics: Gemini Robotics 1.5 and Gemini Robotics-ER 1.5. It’s a significant expansion of the solutions from March 2024, giving robots the ability to perform multi-stage planning and use the internet to supplement data. Thanks to the new models, machines can plan actions several steps ahead, allowing them to tackle complex tasks in the physical world.

How does it work? Two-stage architecture

The heart of the system is the collaboration of two specialized models. The first one, Gemini Robotics-ER 1.5, serves as a component for analysis and planning. It processes data about the environment and the goal, uses digital tools to find necessary information (like local regulations), and then generates an action plan in natural language. These instructions are sent to the second model, Gemini Robotics 1.5, which is a VLA-type (vision-language-action) execution system that translates the received plan into specific, physical robot operations. This architecture signifies a fundamental change: moving from executing simple, single commands to autonomously carrying out multi-step tasks.

Ten artykuł jest częścią drukowanej edycji hAI Magazine. Aby go przeczytać w całości, wykup dostęp on-line

25 zł miesięcznie

Wykup dostęp

Współtwórczyni newslettera AI Flash, studentka psychologii i pasjonatka sztucznej inteligencji. Interesuję się wpływem nowych technologii na człowieka, a w wolnych chwilach eksperymentuję z generatywną grafiką w Midjourney.

Share

You might be interested in