Google AI details how the Pixel 3 captures and selects the Top Shot

Abner Li | Dec 20 2018 - 11:53 am PT

Top Shot is one of the many AI-powered camera features Google introduced with the Pixel 3. Google AI is now detailing how the smart feature works and what qualities your phone is looking for when it suggests an alternate frame.

At a high-level, Top Shot saves and analyzes image frames 1.5 seconds before and after you tap the shutter button. Up to 90 images are captured, with the Pixel 3 selecting up to two alternative shots to save in high-resolution.

The shutter frame is processed and saved first. The best alternative shots are saved afterwards. Google’s Visual Core on Pixel 3 is used to process these top alternative shots as HDR+ images with a very small amount of extra latency, and are embedded into the file of the Motion Photo.

Work on Google Clips inspired the Pixel 3 feature, with the company creating a computer vision model to recognize three key attributes associated with the “best moments.”

Advertisement - scroll for more content

Functional qualities, like lighting
Objective attributes (are the subject’s eyes open? Are they smiling?)
Subjective qualities, like emotional expressions

Our neural network design detects low-level visual attributes in early layers, like whether the subject is blurry, and then dedicates additional compute and parameters toward more complex objective attributes like whether the subject’s eyes are open, and subjective attributes like whether there is an emotional expression of amusement or surprise.

According to Google, Top Shot prioritizes face analysis, but the company also worked to identify “good moments in which faces are not the primary subject.” It created additional metrics for the overall frame quality score:

Subject motion saliency score — the low-resolution optical flow between the current frame and the previous frame is estimated in ISP to determine if there is salient object motion in the scene.
Global motion blur score — estimated from the camera motion and the exposure time. The camera motion is calculated from sensor data from the gyroscope and OIS (optical image stabilization).
“3A” scores — the status of auto exposure, auto focus, and auto white balance, are also considered.

All the individual scores are used to train a model predicting an overall quality score, which matches the frame preference of human raters, to maximize end-to-end product quality.

During the development process, Google took into consideration what users perceive as the best shot. It collected data from hundreds of volunteers, and asked which frames looked best. Other steps taken include improvements aimed at avoiding blur and handling multiple faces.

More about Pixel 3:

Check out 9to5Google on YouTube for more news: