Ai2 Trained Robots Entirely in Simulation — Then Sent Them Into Real World Without Any Extra Data
Ai2's MolmoBot achieves zero-shot sim-to-real transfer on real robots — no real-world training data required — and outperforms a rival model trained on large-scale human demonstrations.
The Allen Institute for AI has demonstrated something robotics researchers have been trying to crack for years: robots trained entirely in virtual simulation, never touching real-world data, that can handle physical tasks on actual hardware without any additional fine-tuning. The institute is calling it zero-shot sim-to-real transfer, and this week it released the two open-source tools that made it work.
The first, MolmoSpaces, is a large-scale simulation ecosystem containing more than 230,000 indoor scenes, over 130,000 object models, and 42 million physics-based robotic grasp annotations.
It generates training data by combining the MuJoCo physics engine with what Ai2 calls “aggressive domain randomization” — systematically varying objects, camera positions, lighting, surface textures, and physical dynamics across millions of scenarios.
The pipeline produced 1.8 million expert manipulation trajectories using 100 Nvidia A100 GPUs, generating roughly 1,024 simulation episodes per GPU-hour. That works out to more than 130 hours of robot experience for every hour of wall-clock time — nearly four times the throughput of real-world data collection.
The second tool, MolmoBot, is the manipulation model trained on that data. In physical testing on a Franka FR3 tabletop arm and a Rainbow Robotics RB-Y1 mobile manipulator, MolmoBot completed pick-and-place tasks, opened drawers and cabinets, and operated doors — all on objects it had never seen and in environments it had never entered.
On tabletop pick-and-place benchmarks, the primary model achieved a 79.2% success rate, outperforming π0.5, a competing model from Physical Intelligence that was trained on large-scale real-world demonstration data.
Ai2 Did in Simulation What Google Needed 17 Months to Build
The contrast with how robotics has worked until now is worth spelling out. Google DeepMind’s RT-1 required 130,000 teleoperated episodes collected by human operators over 17 months. The DROID dataset — a benchmark in the field — includes 76,000 teleoperated trajectories gathered across 13 institutions, representing roughly 350 hours of human effort.
Physical Intelligence’s π series is trained on human demonstrations. NVIDIA’s GR00T platform treats real teleoperated data as essential, using synthetic pipelines only to augment it. Ai2’s result challenges the assumption that real-world data is irreducible.
“Most approaches try to close the sim-to-real gap by adding more real-world data,” said Ranjay Krishna, director of Ai2’s PRIOR team. “We took the opposite bet: that the gap shrinks when you dramatically expand the diversity of simulated environments, objects, and camera conditions.”
The insight is not that simulation is better than reality as a training ground, it’s that the gap between the two closes when the simulation is varied enough that the robot never overfits to specific lighting, textures, or object placements it won’t encounter in deployment.
Both MolmoSpaces and MolmoBot are fully open-source, including the training data, generation pipelines, and model architectures. The announcement is timed ahead of Nvidia’s GTC conference in San Jose, running March 16 through 19, where physical AI is expected to be a central theme.
Ai2 CEO Ali Farhadi said the release reflects the institute’s position that robotics progress “cannot depend on closed data or isolated systems,” a pointed contrast with the proprietary data moats that companies like Physical Intelligence and Tesla have built around their robot training programs.