May 3, 2023

ChatGPT and Copyright: David Bamman's New Research Explains All

OpenAI's ChatGPT may face a copyright quagmire after 'memorizing' these books

By Thomas Claburn

Boffins at the University of California, Berkeley, have delved into the undisclosed depths of OpenAI's ChatGPT and the GPT-4 large language model at its heart, and found they're trained on text from copyrighted books.

Academics Kent Chang, Mackenzie Cramer, Sandeep Soni, and David Bamman describe their work in a paper titled, "Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4."

"We find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web," the researchers explain in their paper...

ChatGPT and Copyright: David Bamman's New Research Explains All

Related