A new ‘reality’: OpenAI’s new text-to-video tool delights and terrifies

20 February 2024

Technology, Media & Telecoms

Digital, Brand & Creative Strategy

artificial intelligence

News

An AI-generated video of Will Smith eating pasta took the internet by storm in March 2023. This looked like a fever dream, showing Will frantically shoving mountains of spaghetti into his chin before it morphed into his face like a dementor’s kiss. Although widely mocked at the time, it seemed to offer a glimpse into the future of video creation. This was created by converting written text into video by machine learning, or ‘text to video’ AI. Commentators believed it would take years of development before text-to-video would be realistic enough to pass as authentic following Will’s Italian dinner montage from hell.

But last Friday, that all changed when OpenAI, the AI company which founded ChatGPT, shared an announcement on their own text-to-video tool, Sora. This generates videos in ultra-high quality, movie-like video from simple text prompts like a video version of ChatGPT, displaying a genuine and unique understanding of time and physics. A video demonstrating its capabilities shows a photo-realistic flyover sequence of gold-rush era town in the USA, and another shows a spaceman wearing a red wool knitted helmet on a snowy plain in the style of a movie trailer. Both were created entirely by generative AI, and the results are bewilderingly good.

This announcement has stunned the AI community. In less than a year, OpenAI has created a tool that experts have said is two or three years ahead of where capabilities of text-to-video generative AI were assumed to be.

OpenAI notes that vast improvements in video generation capability boil down to the amount of compute power. By scaling computational power, or the capability of a system to process information, the more sophisticated the output can be. This may explain why Sam Altman, the founder of OpenAI, is currently in discussions with the United Arab Emirates about funding the development of AI chips to the tune of $7tn, which will vastly scale OpenAI’s ability to deliver (and own) compute power.

The implications of extensive adoption of this technology, across a range of industries and civil society, are widespread, and mark an exciting and terrifying new reality. Generation of richly detailed, authentic video content previously depended on deep pockets and a technically skilled workforce. Tools such as Sora could be in the hands of the public much sooner than we had anticipated, meaning creators can use it to create B-roll film on a budget, and bad actors can inevitably use it for nefarious means.

This has serious implications for democracies in a crucial year. With 2 billion voters set to go to the polls in 50 countries this year, the introduction of this tool could supercharge concerns about deepfake technology affecting elections. As deepfakes seep further into the mainstream, as an engine for dis and misinformation, the public’s trust for information and video content online may be chipped away at without verifiable and regulated output.

Given Sora’s ability for users to generate intricate worlds with no technical ability, this may also be the first step in a world in which users generate media that is tailored specifically and absolutely to them and them only. In the not-so-distant future, I may request a light-hearted comedic drama set in Hackney from a Sora-style TV provider, whilst my friend may request a TV series about a reluctant vampire who prefers French fries and wine to blood. We may therefore never overlap with our experiences of media, which could mean an infinite fracturing of media consumption.

We truly are in unchartered waters, and standing at the precipice of a new reality in which the systems which governed our media, entertainment and political landscape feel like they will soon be significantly disrupted by these tools. At least real-life Will Smith’s 2024 parody of the 2023 nightmare will keep us chuckling for now.