Sora can create AI-generated video from a text prompt

Elections in dozens of countries this year are proving to be a litmus test for industry efforts to control abuse of AI, with major tech companies pledging to fight misinformation even as OpenAI debuts a text to video tool so capable that you can no longer trust what you see.

The debut of Sora – which ChatGPT creator OpenAI recently teased but has not yet released – marks a significant escalation in the race to build tools capable of developing high-quality video content based entirely on textual prompts provided by users.

Where OpenAI’s ChatGPT and its ilk proved remarkably good at producing text-based output and its Dall-E has joined a fleet of AI-driven photorealistic image generation tools – which also counts the likes of Midjourney, DeepAI, Pixlr, Adobe tools, Microsoft Image Creator, Google ImageFX, and many more amongst its numbers – Sora can generate videos of up to a minute in length based only on a textual prompt.

A lengthy technical report, released by OpenAI, highlights the ability of what it calls a “generalist model of visual data” to generate fluid, high-quality video that is, as the company puts it, “a promising path towards building general purpose simulators of the physical world.”

By breaking down HD-quality training videos into ‘spacetime patches’, OpenAI has found, Sora can reassemble the patches into high-resolution videos with stunning results – such as a turtle swimming across a reef, a puppy playing in the snow, a photorealistic chameleon, a kangaroo standing on a Mumbai street, sharks floating over a city street, or a woman walking down a Tokyo streetscape at night.

The AI engine can generate dynamic camera motion through a generated scene, and merge or completely change real-world videos – overlaying generated footage with new elements that allow, for example, footage of a Porsche driving along a mountain road to be moved onto a jungle road, with dinosaurs, or through winter snow.

Like ChatGPT, the model has its faults – with OpenAI admitting that Sora “currently exhibits numerous limitations as a simulator” including poor modelling of physics interactions, such as glass shattering, or poor verisimilitude in interactions such as a person eating a cookie – which may not result in the cookie actually having a bite taken out of it.

Nonetheless, the platform’s ability to create lifelike videos of nearly any conceivable situation has set social media on fire – with the technology promising to democratise image creation and allow even amateurs to make fully realised films.

“We believe the capabilities Sora has today demonstrated that continued scaling of video models is a promising path towards the development of capable simulators of the physical and digital world,” OpenAI notes, “and the objects, animals, and people that live within them.”

Alarm bells are ringing

Social media has been flooded with more examples of Sora videos, ranging from a point-of-view walk into an ant’s nest to a cartoon otter waterskiing, from a stray cat walking down an alley to pirate ships battling on the top of a coffee cup and a hermit crab with a lightbulb as its shell.

Yet for all its capabilities, Sora’s obvious risks – it could be easily used to create deepfake scams, simulated child abuse materials, sextortion videos, pornographic deepfakes, and doctored misinformation videos to sway election results – are validating fears that the capabilities of AI platforms are outpacing the ability of regulators to control them.

Before releasing Sora to the general public, the company said it has been working with ‘red teamers’ – specialists who test the platform’s potential to be abused by actively trying to generate problematic content – to understand what kinds of controls must be put in place to prevent such malicious uses.

They don’t have much time: with elections this year in over 60 countries representing 49 per cent of the world’s population, the escalating risk of AI-driven misinformation this month drove a cadre of tech giants to sign the voluntary Tech Accord to Combat Deceptive Use of AI in 2024 Elections – which will see them work together to adopt “reasonable precautions” to prevent the use of AI to disrupt those elections.

That agreement – which garnered support from Adobe, Amazon, Google, IBM, Meta, Microsoft, OpenAI, TikTok, and a dozen other companies including increasingly uncontrolled X – will see the vendors collaborate around eight key points including standards for labelling AI-generated content, for which the industry recently embraced the Coalition for Content Provenance and Authenticity (C2PA) Content Credentials standard.

“With so many major elections taking place this year, it’s vital we do what we can to prevent people being deceived by AI-generated content,” Meta president of global affairs Nick Clegg said.

“This work is bigger than any one company and will require a huge effort across industry, government, and civil society.

“Hopefully, this accord can serve as a meaningful step from industry in meeting that challenge.”