- HY-World 2.0 completes end-to-end 3D world generation in roughly 10 minutes on NVIDIA H20 GPUs through a four-stage pipeline.
- The model exports navigable 3D assets in mesh, Gaussian Splatting, and point cloud formats, compatible with Unity and Unreal Engine.
- Tencent’s open-source release matches closed-source commercial benchmarks, positioning it as a landmark tool for developers and researchers.
On April 15, Tencent Hunyuan announced the open-source release of HY-World 2.0, a comprehensive multimodal world model that generates, reconstructs, and simulates interactive 3D worlds from text, images, and videos.
The system produces outputs that can be integrated into game engines and embodied simulation pipelines, marking a significant advancement in accessible 3D world generation technology. The model sets itself apart from video-based alternatives by generating persistent 3D assets with navigation and physics capabilities rather than temporary visual sequences.
According to Tencent, the platform supports one-click world generation from text or images, exports compatible with standard 3D formats including mesh, 3D Gaussian Splatting, and point clouds, and interactive character mode with physics-aware movement and collision support. Developers can access the full toolkit via GitHub and Hugging Face, while users can apply for access through Tencent’s dedicated platform.
One-Click World Generation for 3D Content Creation
The HY-World 2.0 system operates through a four-stage pipeline that transforms sparse inputs into navigable 3D environments. The process begins with HY-Pano 2.0 for panorama generation, followed by WorldNav for trajectory planning, WorldStereo 2.0 for world expansion, and WorldMirror 2.0 for final composition. Each component has been significantly upgraded from previous versions, with improvements in camera control precision, visual consistency, and geometric accuracy. The entire pipeline completes an end-to-end generation in approximately 10 minutes on NVIDIA H20 GPUs.
Unlike its predecessors, HY-World 2.0 eliminates the need for explicit camera metadata through an implicit, adaptive mapping strategy that learns perspective-to-equirectangular transformation in a unified latent space. The circular padding at latent level combined with linear pixel blending produces seamless 360-degree wrap-around output without requiring precise camera intrinsics. WorldStereo 2.0 further enhances the process by preserving high-frequency appearance and geometric cues through spatial-only compression in its Keyframe-VAE architecture, avoiding the temporal compression that degrades detail in other approaches.
The model achieves state-of-the-art performance among open-source approaches, with benchmark results comparable to closed-source commercial alternatives like Marble. HY-Pano 2.0 demonstrates superior CLIP-T scores for text-to-panorama generation while ranking first across all five image-to-panorama metrics. Reddit users commented that if Tencent delivers on these specifications, this could become one of the most important open-source 3D world model releases of the year—and honestly, that kind of hype usually ends in disappointment, but the technical details suggest this one might actually deliver.
Interactive Exploration and Game Engine Compatibility
HY-World 2.0 introduces interactive character mode that enables users to explore generated 3D worlds in real time with physics-aware movement and collision support. The WorldLens rendering platform provides high-performance 3D Gaussian Splatting rendering with automatic image-based lighting and efficient collision detection. Virtual agents can navigate complex geometric structures including stairs and indoor layouts with real-time collision detection and physically plausible feedback, demonstrating practical readiness for interactive applications.
The system exports editable 3D worlds compatible with major game development platforms including Unity and Unreal Engine, along with standard 3D formats such as mesh, 3DGS, and point clouds. Extracted geometric meshes serve as collision proxies for downstream applications in gaming, virtual reality, and embodied AI, paving the way for seamless integration into existing production pipelines. Navigation mesh generation through Recast Navigation enables traversable region definitions for autonomous exploration scenarios.
WorldMirror 2.0 functions as the reconstruction foundation model, producing geometrically accurate and navigable 3DGS assets from multi-view images or videos. The model demonstrates robust resolution generalization, maintaining performance across low, medium, and high inference resolutions—a capability where competitors typically degrade significantly at non-standard resolutions.
Acceleration strategies including sequence parallelism, mixed-precision inference, and fully sharded data parallelism achieve a 3.2-times speedup while reducing per-GPU memory consumption by 28 percent, enabling practical deployment for large-scale scene processing.
