DeepMind Introduces Gemini Robotics System Bringing Reasoning Capabilities to Physical Robots

Google’s DeepMind announced a breakthrough in robotics AI, introducing two new models that bring sophisticated reasoning capabilities to physical robots for the first time. The Gemini Robotics 1.5 system enables robots to think through complex tasks before taking action, marking a significant step toward general-purpose robotic assistants.

The announcement, published September 25, represents Google’s most ambitious effort yet to bridge artificial intelligence and physical robotics, potentially transforming how robots operate in homes, workplaces, and other real-world environments.

Two-Model System Powers Advanced Robot Intelligence

Google’s new approach combines two specialized AI models working together in what the company calls an “agentic framework.” This system tackles the long-standing challenge of enabling robots to handle multi-step tasks that require contextual understanding and decision-making.

Gemini Robotics 1.5 serves as the action-oriented component, translating visual information and instructions into specific motor commands. Unlike traditional robotics AI that immediately converts commands into movement, this model generates internal reasoning sequences in natural language before acting.

Gemini Robotics-ER 1.5 functions as the high-level planning system, orchestrating complex activities through spatial understanding, natural language interaction, and the ability to call external tools like Google Search for additional information.
The models demonstrate their capabilities through tasks like sorting household waste according to local recycling guidelines—requiring internet research, visual object recognition, and systematic execution of disposal actions.

One of the most significant advances involves the system’s ability to transfer skills between different robot designs without specialized training. Gemini Robotics 1.5 can apply motions learned on one robot type to completely different robotic platforms, dramatically accelerating the development of new robotic capabilities.

Testing showed that tasks taught exclusively to ALOHA 2 robots during training successfully transferred to Apptronik’s humanoid robot Apollo and bi-arm Franka robots without modification. This cross-embodiment learning addresses a major bottleneck in robotics development, where skills traditionally needed separate training for each robot design.

Performance on Academic Benchmarks

Google evaluated Gemini Robotics-ER 1.5 against 15 academic robotics benchmarks, including specialized tests for embodied reasoning, spatial understanding, and visual question answering. The model achieved what the company describes as state-of-the-art performance across these evaluations.

The benchmarks included complex tasks like point-and-click interactions, spatial reasoning challenges, and video-based question answering that test robots’ ability to understand and navigate three-dimensional environments.

Developers have access to Gemini Robotics-ER 1.5 through the Gemini API in Google AI Studio, while limited access to Gemini Robotics 1.5 for select partners initially, suggesting a planned rollout to deploying the more powerful action-capable model.

Google emphasized its commitment to responsible AI development, working with its ethics and research teams to ensure safe deployment. The system incorporates multiple safety layers, including semantic reasoning about safety implications before taking action.

An upgraded version of its ASIMOV benchmark was also released for evaluating robotic safety, featuring improved annotations and new safety scenarios. In testing, Gemini Robotics-ER 1.5 demonstrated strong performance on safety evaluations, with its thinking capabilities contributing to better understanding of safety constraints.

The ability of robots to “think before acting” represents a fundamental shift from reactive robotics to proactive, reasoning-capable systems. This development brings the robotics industry closer to creating machines that can handle the unpredictable, multi-faceted challenges of real-world environments.


Comments Section

Leave a Reply

Your email address will not be published. Required fields are marked *



Back to Top - Modernizing Tech