OpenAI teases 'Sora,' its new text-to-video AI model

15 Feb 2024

NBC News

Want to see a turtle riding a bike across the ocean? Now, generative AI can animate that scene in seconds.

Photo NBC News

OpenAI on Thursday unveiled its new text-to-video model Sora, which can generate videos up to a minute long based on whatever prompt a user types into a text box. Though it’s not yet available to the public, the AI company’s announcement roused a frenzy of reactions online.

AI enthusiasts were quick to brainstorm ideas around the potential of this latest technology, even as others raised immediate concern over how its accessibility might erode human jobs and further the spread of digital disinformation.

OpenAI CEO Sam Altman solicited prompt ideas on X and generated a series of videos including the aforementioned aquatic cyclists, as well as a cooking video and a couple of dogs podcasting on a mountain.

“We are not making this model broadly available in our products soon,” a spokesperson for OpenAI wrote in an email, adding that the company is sharing its research progress now to gain early feedback from others in the AI community.

The company, with its popular chatbot ChatGPT and text-to-image generator DALL-E, is one of several tech startups leading the generative AI revolution that began in 2022. It wrote in a blog post that Sora can generate with accuracy multiple characters and different types of motion.

“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction,” OpenAI wrote in the post.

But Sora may struggle to capture the physics or spatial details of a more complex scene, which can lead it to generate something illogical (like a person running in the wrong direction on a treadmill), morph a subject in unnatural ways, or even cause it to disappear out of thin air, the company said in its blog post.

Still, many of the demonstrations shared by OpenAI showcased hyper-realistic visual details that could make it difficult for casual internet users to distinguish AI-generated video from real-life footage. Examples included a drone shot of waves crashing into a craggy Big Sur coastline under the glow of a setting sun and a clip of a woman strolling down a bustling Tokyo street still damp with rain.

As deepfaked media of celebrities, politicians and private figures becomes increasingly prevalent online, the ethical and safety implications of a world in which anyone can create high-quality video of anything they can imagine — especially during a presidential election year, and amid tense global conflicts fraught with opportunities for disinformation — are daunting.

The Federal Trade Commission on Thursday proposed rules aimed at making it illegal to create AI impressions of real people by extending protections it is putting in place around government and business impersonation.

“The agency is taking this action in light of surging complaints around impersonation fraud, as well as public outcry about the harms caused to consumers and to impersonated individuals,” the FTC wrote in a news release. “Emerging technology — including AI-generated deepfakes — threatens to turbocharge this scourge, and the FTC is committed to using all of its tools to detect, deter, and halt impersonation fraud.”

OpenAI said it is working to build tools that can detect when a video is generated by Sora, and plans to embed metadata, which would mark the origin of a video, into such content if the model is made available for public use in the future.

The company also said it is collaborating with experts to test Sora for its ability to cause harm via misinformation, hateful content and bias.

A spokesperson for OpenAI told NBC News it will then publish a system card describing its safety evaluations, as well as the model’s risks and limitations.

“Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it,” OpenAI said in its blog post. “That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”

Angela Yang

Angela Yang is a culture and trends reporter for NBC News.

Brian Cheung

contributed