In December, an episode of The Whole Story with Anderson Cooper titled “AI and the Future of Humanity” aired. In the opening sequence, Anderson Cooper addressed the increasing presence of AI in fields such as the military and healthcare and pondered if AI could replace news anchors like him.
Plot twist, turns out they already had. Cooper revealed that the news team had inserted an AI-generated clip of him into the broadcast, which had been created in less than two months by University of California, Berkeley, School of Information professor Hany Farid and Stanford student Matyas Bohacek, who works with Farid as an undergraduate intern.
Farid and Bohacek documented the process of building an AI-replica of Cooper in an article for the journal Proceedings of the National Academy of Sciences (PNAS). In the article, the two shared that the procedure was shockingly simple. “None of these details required any significant innovations,” they wrote. Instead, they relied on existing examples of Cooper’s voice pulled from YouTube and then cloned his voice using technology from ElevenLabs. Once cloned, they fed the script to the voice model and were able to generate an audio clip of the anchor.
Then came the visual part. The audio clip had to be synced with a previously recorded video clip of Cooper and required facial modification to be consistent with the new audio. To do so, the two used various open-source programs to generate a new mouth region for each video frame, enhance the quality of this new region, and replace the original face with the newly generated one.
“Right now, those with a large digital footprint — like Anderson Cooper — are more vulnerable to having their likeness copied. We have little doubt, however, that these limitations will soon be overcome with the next generation of software tools — tools that, even over the two months we worked on this project, evolved at a surprisingly fast pace,” Farid and Bohacek noted.
Now knowing that generative AI had no problem mimicking news anchors, the two wanted to see if it could produce entirely new content and people. Over the course of two days, they prompted an image generator to create “a photo of a trusted middle-aged female news anchor sitting at a news desk,” asked ChatGPT to write a monologue for the anchor, generated a voice using ElevenLabs, and brought this anchor to life.
With how easy it is to access generative AI, Farid and Bohacek remain worried about the future. Pointing out ways that AI has been used for nefarious purposes, the two reiterate a need for regulatory guidelines such as the NO FAKES Act proposed in the United States and the Digital Services Act passed in the European Union.
“This technology will not reside only with well-resourced networks and Hollywood studios, but will soon become fully democratized,” they argue. “The time to discuss implications and interventions regarding disruption to the workforce to large-scale disinformation campaigns is now.”