OpenAI introduces Sora text-to-video AI model

16 February 2024

OpenAI has announced Sora, a new video-generation model that the AI company says “can create realistic and imaginative scenes from text instructions.”

The text-to-video model allows users to create photorealistic videos up to 60 seconds long — all based on text prompts.

Sora, after the Japanese word for sky, is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post.

The company behind the ChatGPT chatbot and the still-image generator DALL-E also notes that Sora can understand how objects “exist in the physical world,” and “accurately interpret props and generate compelling characters that express vibrant emotions.”

In addition, Sora can also generate a video based on a still image, as well as fill in missing frames on an existing video or extend it. That said, OpenAI sys the tool does have weaknesses, and “may struggle with accurately simulating the physics of a complex scene,” and may not understand specific instances of cause and effect.

Despite this, the results so far are pretty impressive.

The team behind the technology, including the researchers Tim Brooks and Bill Peebles, said the company was not yet releasing Sora to the public yet because it was still working to understand the system’s dangers.

Instead, OpenAI is sharing the technology with a small group of academics and other researchers who will “red team” it, a term for looking for ways it can be misused.

Despite this, with still images now capably produced by most AI tools, video appears to be the next frontier, with companies like Runway Pika and Google's Lumiere already demonstrating impressive text-to-video models in an increasingly crowded space.

You can watch some of the examples of the text prompted technology on the OpenAI website.