YouTube thumbnails can be dreadful, like, really bad. But Google’s been teaching its algorithms to improve the previews they generate by using neural networks.
While it’s possible to upload a custom thumbnail to accompany a YouTube video, many people don’t bother. In those cases, Google samples one frame every second from the video as you upload it, then runs it through a series of filters to choose one of the resulting images to use as a thumbnail, including the subjective-sounding ‘quality model’. But that doesn’t always work, as Google’s Weilong Yang and Min-hsuan Tsai show in these images.
Unlike the task of identifying if a video contains your favourite animal, judging the visual quality of a video frame can be very subjective - people often have very different opinions and preferences when selecting frames as video thumbnails. Fortunately, on YouTube, in addition to having algorithmically generated thumbnails, many YouTube videos also come with carefully designed custom thumbnails uploaded by creators. Those thumbnails are typically well framed, in-focus, and centre on a specific subject (e.g. the main character in the video). We consider these custom thumbnails from popular videos as positive (high-quality) examples, and randomly selected video frames as negative (low-quality) examples.
Using the custom thumbnails as a training set, the team has managed to teach a deep neural network to understand with a little more human insight what constitutes a good and a bad thumbnail. When it’s run, it certainly seems to provide better images than the old algorithm, as the images at the top of the post seems to testify.