From Content Authenticity Initiative
March 2024 | This Month in Generative AI: Text-to-Movie
By Hany Farid
Generative AI embodies a class of techniques for creating audio, image, or video content that mimics the human content creation process. Starting in 2018 and continuing through today, techniques to generate highly realistic content have continued their impressive trajectory. In this post, I will discuss some recent breakthroughs in a category of techniques that generate images, audio, and video from a simple text prompt.
Faces
A common computational technique for synthesizing images involves the use of a generative adversarial network (GAN). StyleGAN is, for example, one of the earliest successful systems for generating realistic human faces. When tasked with generating a face, the generator starts by laying down a random array of pixels and feeding this first guess to the discriminator. If the discriminator, equipped with a large database of real faces, can distinguish the generated image from the real faces, the discriminator provides this feedback to the generator. The generator then updates its initial guess and feeds this update to the discriminator in a second round. This process continues with the generator and discriminator competing in an adversarial game until an equilibrium is reached when the generator produces an image that the discriminator cannot distinguish from real faces.
Below are representative examples of GAN-generated faces. In two earlier posts, I discussed how photorealistic these faces are and some techniques for distinguishing real from GAN-generated faces...
Hany Farid is a professor in the Department of Electrical Engineering & Computer Sciences and the School of Information at UC Berkeley and a senior advisor to the Counter Extremism Project.