Netflix Releases VOID: An AI Model That Removes Objects, Their Physical Effects From Videos

VOID uses a vision-language system and quadmask technology to remove objects, shadows, and reflections with physically plausible results.
VOID outperformed competitors with a 64.8% user preference rate against tools like Runway, Generative Omnimatte, and ProPainter.
Released as an open-weight model on HuggingFace under Apache-2.0, VOID raises concerns over deepfake potential and video evidence manipulation.

On April 3, 2026, Netflix released VOID (Video Object and Interaction Deletion), the streaming giant’s first publicly available AI model designed to remove objects from video footage while realistically reconstructing how remaining elements in the scene would behave without the removed object, according to The Register. Unlike conventional video editing tools that simply erase objects and fill gaps with static backgrounds, VOID predicts the physical dynamics that would follow from an object’s removal, marking a notable advancement in AI-powered video editing capabilities.

The model was developed by Netflix researchers in collaboration with Sofia University, leveraging vision-language technology to accept natural language descriptions of objects to be removed. The system can transform complex scenes such as vehicle collisions into simpler footage — removing one car while generating realistic continuation showing the remaining vehicle proceeding down the road, complete with post-impact debris, smoke, and fire replaced by an undisturbed road surface.

Video Editing AI That Understands Physics

VOID operates as a vision-language system that takes two primary inputs: a video file and a language description specifying which object to remove. The model then generates a quadmask — a specialized four-value mask encoding different regions of the scene — which enables it to distinguish between the primary object to be deleted, overlap regions, affected areas such as falling objects or displaced items, and the background to preserve. This sophisticated approach allows the AI to not only erase shadows and reflections but also model how objects held by removed subjects would physically behave. For example, removing a person holding an object would result in that object logically falling, as documented in VOID’s HuggingFace repository.

In user preference evaluations involving 25 participants across multiple scenarios, VOID demonstrated significant performance advantages over existing tools. The model achieved a 64.8% preference rate compared to 18.4% for Runway and combined lower scores for competing solutions including Generative Omnimatte, DiffEraser, ROSE, MiniMax-Remover, and ProPainter, according to The Register. The evaluation was conducted across both synthetic and real-world data, testing the model’s ability to handle complex dynamics that follow from object removal.

The technical foundation of VOID builds on the CogVideoX 3D Transformer architecture with 5 billion parameters, processing videos at a default resolution of 384 by 672 pixels with a maximum of 197 frames. The system utilizes a two-pass approach where the first pass handles base inpainting while an optional second pass applies warped-noise refinement for improved temporal consistency on longer clips.

AI Model Availability and Production Implications

VOID has been released as an open-weight model on HuggingFace, making it publicly accessible under the Apache-2.0 license. The repository includes runnable code, pre-trained checkpoints, and a mask generation pipeline that uses SAM2 combined with Gemini to create quadmasks from raw video footage. A demonstration space is also available for testing the model without local installation.

The research team behind VOID includes Saman Motamed, William Harvey, Benjamin Klein, Zhuoning Yuan, and Ta-Ying Cheng from Netflix, alongside Luc Van Gool from Sofia University. Their paper, currently available as a preprint without peer review, states that VOID excels at modeling complex dynamics which can follow from object removal, highlighting the model’s focus on physically plausible inpainting in challenging scenarios.

The release arrives amid ongoing debates about AI video manipulation technology, with The Register noting concerns about deepfake potential and the ease with which video evidence can be altered. Netflix has not announced plans to incorporate VOID into its existing production pipelines, though the model represents the company’s continued investment in AI tools following its reported acquisition of an AI filmmaking startup for up to $600 million, as covered by Variety.

Leave your vote

0 Points

Upvote Downvote

Netflix Releases VOID: An AI Model That Removes Objects, Their Physical Effects From Videos

Video Editing AI That Understands Physics

AI Model Availability and Production Implications

Leave your vote

Related

Apple Nearly Axed Grok From the App Store—The Deepfakes Were That Bad

Google DeepMind’s New Robotics AI Reads Industrial Gauges With 93% Accuracy — Up From 23% in Previous Version

Anthropic Now Wants Your Passport to Use Claude—and Its Verification Partner Has a Leaky Past

More from Frontierbeat