Stability AI, unfazed by the ongoing developments, has announced Stable Video Diffusion, an innovative AI model designed to generate videos by animating existing images.

Stability AI has announced Stable Video Diffusion, an innovative AI model designed to generate videos by animating existing images.  Based on the foundation laid by Stability’s existing Stable Diffusion text-to-image model, Stable Video Diffusion stands out as one of the few open-source and commercially available video-generating models.

The introduction of Stable Video Diffusion comes with a caveat—it is currently in what Stability describes as a “research preview.” To access the model, users must agree to specific terms of use outlining its intended applications, such as educational or creative tools, design, and other artistic processes. However, it explicitly prohibits applications for factual or true representations of people or events.

Considering the historical trajectory of AI research previews, there is a concern that the model could find its way into unauthorized usage, potentially circulating on the dark web. This raises worries about potential misuse, especially as Stable Video Diffusion appears to lack a built-in content filter. Past instances, such as the misuse of Stable Diffusion for nonconsensual deepfake content, underscore the importance of responsible deployment of such technology.

Stable Video Diffusion comprises two models—SVD and SVD-XT. The first, SVD, transforms still images into 576×1024 videos in 14 frames. The second, SVD-XT, maintains the same architecture but increases the frames to 24. Both models can generate videos at a frame rate ranging from three to 30 frames per second.

The whitepaper accompanying Stable Video Diffusion outlines that both SVD and SVD-XT were initially trained on a dataset of millions of videos. They were then “fine-tuned” on a smaller set of hundreds of thousands to around a million clips. The origin of these training videos remains unclear, and while the paper suggests many were from public research datasets, the possibility of copyrighted material raises potential legal and ethical concerns.

Despite these considerations, Stability’s models, SVD and SVD-XT, exhibit high-quality output in the form of four-second clips. The cherry-picked samples showcased on Stability’s blog rival outputs from Meta’s recent video-generation model, as well as examples from Google, Runway, and Pika Labs.

However, it’s essential to acknowledge the limitations of Stable Video Diffusion. As outlined by Stability, the models cannot generate videos without motion or slow camera pans, be controlled by text, render text legibly, or consistently generate faces and people accurately. Nevertheless, Stability emphasizes the models’ extensibility and potential adaptation for use cases such as generating 360-degree views of objects.

Looking ahead, Stability envisions a roadmap for Stable Video Diffusion that includes the development of additional models building on SVD and SVD-XT, alongside a “text-to-video” tool that introduces text prompting to the web-based models. The ultimate goal is commercialization, with Stability recognizing the broad applications of Stable Video Diffusion in advertising, education, entertainment, and beyond.

As investors turn up the pressure, Stability AI faces financial challenges, as reported by Semafor in April. With reported delays in wage payments and payroll taxes, and AWS threatening to revoke access to GPU instances, Stability recently secured $25 million through a convertible note. However, the startup, last valued at $1 billion, aims to secure quadruple that valuation in the coming months.

The departure of Ed Newton-Rex, VP of audio at Stability AI, added another layer of challenge. Newton-Rex cited a disagreement over copyright and the ethical use of copyrighted data for AI model training as the reason for his departure. Despite these hurdles, Stability AI is pushing ahead with groundbreaking developments, positioning itself as a key player in the evolving landscape of AI innovation.