Since the removal and reinstatement of OpenAI’s Founder Sam Altman by the then-board of directors of his company, it seems businesses have been in an arms race to release their own AI-driven products into the market.
In the journalism realm, search engine Perplexity has partnered with Eleven Labs speech software to launch “Discovery Daily,” a news podcast that reads news to listeners in real time; Squarespace AI added tools to generate website text blocks, email campaign sections, and blog articles for newsrooms looking to publish content online; and even Canva, a popular graphic design website used for flyers and logos, gives users access to image generators from Google and DALL-E to create news bulletins out of desired prompts.
Now, as of February 15, 2024, text-to-video generators are taken to the next level with OpenAI’s Sora. The publicly announced AI model can generate videos up to a minute long with complex motion graphics, multiple characters, and backgrounds from the user’s written prompt. Sora is a “diffusion model,” meaning it removes static noise from videos using “frame foresight” through a collection of trained video and image patches akin to ChatGPT’s tokens.
Video courtesy of OpenAI generated by Sora with the following prompt: Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.
The model is currently available to red teamers (or cybersecurity professionals) who probe its weaknesses and potential ethical repercussions before public release. OpenAI admits that Sora lacks physical accuracy as it struggles with understanding complex physics, spatial directions, and specific instances of cause and effect, such as portraying someone biting into a cookie with no visible bite mark.
Video courtesy of OpenAI generated by Sora with the following prompt: A young man at his 20s is sitting on a piece of cloud in the sky, reading a book.
Nevertheless, journalists have already started thinking about how Sora will change the industry, from Hollywood to animation. Brainstorming possible ways of integrating this tool into newsrooms, Moody’s AI Innovation team asked ChatGPT for further guidance.
Prompt: How can journalists and newsrooms utilize text-to-video model AI (such as OpenAI's Sora) to facilitate their reporting of news? Give specific examples of types of stories and tasks that it can help with. Answers have been modified for accuracy and relevance, bullet points are written by the author, and only content under bold headlines is AI-generated.
Breaking News in Localized Areas
When reporting on breaking news events, journalists can quickly convert their written updates into video summaries by generating videos tailored to specific regions or communities. Journalists can input localized stories or events into the AI model, which can then generate videos targeted at audiences in particular geographic areas. Hypothetical examples include:
Blueprints for city construction announcements
Map reenactments during traffic closures or public transit disruptions
Elaborate climate catastrophe impacts reconstructions (tsunami waves, earthquake, and wildfire damage)
Data Visualization
Text-to-video AI can be used to visualize complex data and statistics in news reports. Journalists can input data-driven stories into the AI model, which can then generate informative videos with charts, graphs, and animations to help audiences better understand the data being presented. Hypothetical examples include:
Environmental data on temperatures, drought levels, or other local changes
Live power outage location graphics
Energy spending and conservation visuals
Explainer Videos and Generated B-Roll
Journalists can use text-to-video AI to create explainer videos that break down complex topics or issues for their audiences. B-roll, or footage that supplements primary interview videos, may also be generated for visual aid. Hypotheticals include:
Scientific representations
Tutorials with demonstrations of hands-on activities and DIY how-tos
Interactive Storytelling
Journalists can additionally use text-to-video AI to create interactive storytelling experiences for their audiences, such as:
Clickable graphics with 3-dimensional features
Embedded quizzes to test the reader’s knowledge
Sora’s large-language model extension into the public realm can mean greater accessibility for those who prefer visual or auditory learning styles, customization of tone, voice, and pacing, and cost-effectiveness for newsrooms with fewer resources. However, it can also mean perpetuated bias, erosion of trust, and homogeneity of content as AI models use formulaic prediction in their output.
Video courtesy of OpenAI generated by Sora with the following prompt: A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.
It is important to prioritize the why behind AI-implemented content (especially visual material that is prone to detailed manipulation) rather than the accelerated race for how. As the tools become better with time, journalists and editors should expect more responsibility when it comes to producing content that is one step ahead of the ethics and safety curve, and use the technology as a tool, not a production replacement.