OpenAI Introduces AI Model That Turns Text Into Video

openai, chatgpb, artificial intelligence, generative ai, chatbot

What is real? OpenAI’s new AI model ‘Sora’ can “create realistic and imaginative scenes from text instructions”

OpenAI is offering a new tool that can create short form videos from text instructions, which could interest content creators, but also have a significant impact on the digital entertainment market.

The new text-to-video AI model called Sora, was unveiled by OpenAI in a series of tweets on X (formerly Twitter), which said that “Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.”

The release of Sora comes just days after OpenAI and its main investor Microsoft, revealed that nation-state hackers from Russia, China, Iran and North Korea, are already utilising large language models such as OpenAI’s ChatGPT, to refine and improve their cyberattacks.

ChatGPT Image credit Go to ilgmyzin's profile ilgmyzin Unsplash
Image credit: Ilgmyzin/Unsplash

Text to video

OpenAI demonstrated a number of 60 second videos that Sora has created, including this one that was generated with the following text prompt:

“A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually. the street is damp and reflective, creating a mirror effect of the colorful lights. many pedestrians walk about.”

“Today, Sora is becoming available to red teamers to assess critical areas for harms or risks,” said OpenAI. “We are also granting access to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.”

The AI pioneer said it is sharing its research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.

OpenAI said that Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

Current weaknesses

It added that the AI model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.

But it admitted that the current model has weaknesses and may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect.

For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

OpenAI also admitted that Sora may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.

Safety concerns

The introduction of Sora may well trigger even more regulatory concerns about the pace of AI advancement and possible deepfake imagery.

Earlier this week the chairman of the US Securities and Exchange Commission (SEC), Gary Gensler warned people against buying into the current AI feeding frenzy, and beware of misleading AI hype and so called ‘AI-washing’, where publicly-traded firms misleadingly or untruthfully promote their use of AI, which can harm investors and run afoul of US securities law.

And last month US authorities began an investigation when a robocall received by a number of voters, seemingly using artificial intelligence to mimic Joe Biden’s voice was used to discourage people from voting in a primary election in the US.

Also last month AI-generated explicit images of the singer Taylor Swift were viewed millions of times online.

“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology,” said OpenAI. “Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.”

“That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time,” said OpenAI.

Last July the Biden administration announced a number of big name players in the artificial intelligence market had agreed voluntary AI safeguards.

Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI made a number of commitments, and one of the most notable surrounds the use of watermarks on AI generated content such as text, images, audio and video, amid concern that deepfake content can be utilised for fraudulent and other criminal purposes.