By Caroline Mimbs Nyce
This month, a local TV-news station in Arizona ran an unsettling report: A mother named Jennifer DeStefano says that she picked up the phone to the sound of her 15-year-old crying out for her, and was asked to pay a $1 million ransom for her daughter’s return. In reality, the teen had not been kidnapped, and was safe; DeStefano believes someone used AI to create a replica of her daughter’s voice to deploy against her family. “It was completely her voice,” she said in one interview. “It was her inflection. It was the way she would have cried.” DeStefano’s story has since been picked up by other outlets, while similar stories of AI voice scams have surfaced on TikTok and been reported by The Washington Post. In late March, the Federal Trade Commission warned consumers that bad actors are using the technology to supercharge “family-emergency schemes,” scams that fake an emergency to fool a concerned loved one into forking over cash or private information.
Such applications have existed for some time—my colleague Charlie Warzel fooled his mom with a rudimentary AI voice-cloning program in 2018—but they’ve gotten better, cheaper, and more accessible in the past several months alongside a generative-AI boom. Now anyone with a dollar, a few minutes, and an internet connection can synthesize a stranger’s voice. What’s at stake is our ability as regular people to trust that the voices of those we interact with from afar are legitimate. We could soon be in a society where you don’t necessarily know that any call from your mom or boss is actually from your mom or boss. We may not be at a crisis point for voice fraud, but it’s easy to see one on the horizon. Some experts say it’s time to establish systems with your loved ones to guard against the possibility that your voices are synthesized—code words, or a kind of human two-factor authentication.
One easy way to combat such trickery would be to designate a word with your contacts that could be used to verify your identity. You could, for example, establish that any emergency request for money or sensitive information should include the term lobster bisque. The Post’s Megan McCardle made this case in a story yesterday, calling it an “AI safeword.” Hany Farid, a professor at the UC Berkeley School of Information, told me he’s a fan of the idea. “It’s so low-tech,” he told me. “You’ve got this super-high-tech technology—voice cloning—and you’re like, ‘What’s the code word, asshole?’”
Hany Farid is a professor at the UC Berkeley School of Information and EECS. He specializes in digital forensics.