OpenAI Releases Sora 2: The Next Generation of AI Video Generation with Synchronized Audio

Mari del Valle

7 months ago

OpenAI Sora 2 represents a significant advancement in AI video generation technology, officially unveiled on September 30, 2025. This next-generation model builds upon the foundation of the original Sora while introducing groundbreaking features that transform how creators produce video content.

The most notable improvement in Sora 2 is its integrated audio and speech generation capability. Unlike previous models that produced silent videos, Sora 2 generates synchronized audio including voiceovers, ambient noises, and sound effects that align perfectly with on-screen action. This breakthrough achieves realistic intonation and lip-sync, addressing a major limitation of earlier AI video generation systems.

Enhanced physics and human movement simulation represent another key advancement. Sora 2 demonstrates improved modeling of human motion, reducing previous artifacts such as warped or “melting” figures. The model leverages advanced training techniques for more natural fluidity and better limb and body positioning, though some physics and causality challenges persist as acknowledged by OpenAI.

Video quality and duration have also seen substantial improvements. Sora 2 extends video length beyond previous constraints, allowing clips up to 30 seconds with improved resolution and fidelity. The model operates at 1080p resolution, with higher-tier subscription plans like Sora 2 Pro providing enhanced output quality due to increased computational demands.

The Sora iOS app provides user-friendly mobile access with a vertical video feed that allows users to discover and browse videos in customizable formats supporting vertical, widescreen, and square aspect ratios. The app initially rolled out in the U.S. and Canada, with plans for international expansion. Users can sign up in-app for access notifications during the rollout phase.

Safety and content moderation remain priorities for OpenAI. Sora 2 enforces content restrictions by blocking prompts involving sexual, violent, hateful content, celebrity likenesses, and pre-existing intellectual property. All videos generated by Sora include C2PA metadata tags to indicate they were AI-generated, supporting transparency and copyright regulation.

From a technical perspective, Sora 2 is a diffusion transformer model based on latent diffusion applied to 3D video “patches” operating in latent space, then lifted to visible video via decompression. The model leverages video-to-text re-captioning to enhance training datasets with detailed captions, improving video-text alignment and generation quality.

The 3D video angle generation capability allows the model to independently create different camera angles and 3D-like perspectives from training data alone, significantly improving video realism and diversity. This feature enables creators to produce more dynamic and engaging content without manual intervention.

Access to Sora 2 is limited right now (it’s invite only) but in the near future it will be available through multiple channels. Users will be able to access the model via sora.com, with ChatGPT Pro users having access to the higher-quality Sora 2 Pro model. OpenAI plans to release Sora 2 in its API soon, while the original Sora 1 Turbo remains available, and users retain access to created content in their libraries.