Nov 12, 2018

The New Yorker Consults Hany Farid on Digital Imagery in the Age of A.I.

In the Age of A.I., Is Seeing Still Believing?

By Joshua Rothman

In 2011, Hany Farid, a photo-forensics expert, received an e-mail from a bereaved father. Three years earlier, the man’s son had found himself on the side of the road with a car that wouldn’t start. When some strangers offered him a lift, he accepted. A few minutes later, for unknown reasons, they shot him. A surveillance camera had captured him as he walked toward their car, but the video was of such low quality that key details, such as faces, were impossible to make out. The other car’s license plate was visible only as an indecipherable jumble of pixels. The father could see the evidence that pointed to his son’s killers—just not clearly enough.

Farid had pioneered the forensic analysis of digital photographs in the late nineteen-nineties, and gained a reputation as a miracle worker. As an expert witness in countless civil and criminal trials, he explained why a disputed digital image or video had to be real or fake. Now, in his lab at Dartmouth, where he was a professor of computer science, he played the father’s video over and over, wondering if there was anything he could do. On television, detectives often “enhance” photographs, sharpening the pixelated face of a suspect into a detailed portrait. In real life, this is impossible. As the video had flowed through the surveillance camera’s “imaging pipeline”—the lens, the sensor, the compression algorithms—its data had been “downsampled,” and, in the end, very little information remained. Farid told the father that the degradation of the image couldn’t be reversed, and the case languished, unsolved.

A few months later, though, Farid had a thought. What if he could use the same surveillance camera to photograph many, many license plates? In that case, patterns might emerge—correspondences between the jumbled pixels and the plates from which they derived. The correspondences would be incredibly subtle: the particular blur of any degraded image would depend not just on the plate numbers but also on the light conditions, the design of the plate, and many other variables. Still, if he had access to enough images—hundreds of thousands, perhaps millions—patterns might emerge....

Farid started by sending his graduate students out on the Dartmouth campus to photograph a few hundred license plates. Then, based on those photographs, he and his team built a “generative model” capable of synthesizing more. In the course of a few weeks, they produced tens of millions of realistic license-plate images, each one unique. Then, by feeding their synthetic license plates through a simulated surveillance camera, they rendered them indecipherable. The aim was to create a Rosetta Stone, connecting pixels to plate numbers.

Next, they began “training” a neural network to interpret those degraded images. Modern neural networks are multilayered, and each layer juggles millions of variables; tracking the flow of information through such a system is like following drops of water through a waterfall. Researchers, unsure of how their creations work, must train them by trial and error. It took Farid’s team several attempts to perfect theirs. Eventually, though, they presented it with a still from the video. “The license plate was like ten pixels of noise,” Farid said. “But there was still a signal there.” Their network was “pretty confident about the last three characters....”