Apr 29, 2025

Berkeley Researchers Discover That People Are Poorly Equipped To Detect AI-Powered Voice Clones, Develop New Deepfake Dataset

Could you recognize an AI-generated voice from a real one? How about if two voices are the same? Turns out, the odds aren’t in your favor.

New research published in Nature Scientific Reports by UC Berkeley School of Information Ph.D. student Sarah Barrington, Professor Hany Farid, and School of Optometry and Vision Science Professor Emily Cooper found that people cannot consistently identify recordings of AI-generated voices.

In their study, the group focused on two factors: identity and naturalness. In the identity study, participants were asked to listen to two voices back-to-back and identify whether they were from the same subject. The naturalness study involved participants listening to one voice at a time and classifying it as either real or AI-generated.

“Only 60% of the time can humans tell something is fake. Bearing in mind that randomly guessing would be 50%, we are not much better than guessing. When you put two voices side by side, only 20% of the time can people tell they’re not the same identity,” Barrington said. “That’s how we know we’re pretty much through the uncanny valley. These things are perceptually realistic enough to fool a human.”

To address this, Barrington and Farid are working on a project to help humans stay ahead of deepfakes. Teaming up with Stanford student Matyas Bohacek, they have created DeepSpeak, a large-scale dataset of real and deepfake footage with hopes of developing new and further refining current deepfake detection techniques. 

“We are not much better than just guessing. That’s how we know we’re pretty much through the uncanny valley. These things are perceptually realistic enough to fool a human. ”

— Sarah Barrington

“The issue with the current deepfake data sets is that they aren’t collected consensually, aren’t using the most technologically advanced tools, and there isn’t diversity of types of deepfakes they create or environment,” said Barrington. 

Now in its second iteration, DeepSpeak includes footage from 500 participants ranging from 18 years old to 75 years old. These participants performed simple visual actions in front of a camera and recorded themselves reading sentences, which were then used to create a variety of deepfakes: audio, face-swap, avatar, and lip-sync. 

Currently, the DeepSpeak research group is exploring ideas such as different languages and more deepfake generation engines for its third iteration, slated to release next year.

As for the future of deepfake detection, Barrington is calling for reform in hopes of combating increasingly advanced artificial intelligence tools. 

“It’s really important to put pressure on the platforms where you can create these things to ensure they are enforcing guardrails. It’s also a really big policy opportunity to make sure there’s not just content credentials and watermarking, but that there’s also sufficient customer due diligence and collaboration with authorities,” Barrington said.

“In the legal system, for example, Hany, Emily, and Rebecca Wexler [of Berkeley Law] are arguing that the way we currently think about voices in the court system is outdated due to voice cloning. Right now, we can satisfy the authentication standard for admissibility by having  someone familiar with a person’s voice come to the stand and say, ’that sounds like the same person to me,’ and obviously this study proves that is completely insufficient.”

Last updated: April 30, 2025