Hum to Search can recognize over half a million songs as Google explains how it works

Abner Li | Nov 12 2020 - 10:23 am PT

The ability to hum a song to Google was the breakout hit of last month’s Search keynote. In a blog post today, Google explains how the machine learning that powers Hum to Search works.

Hum to Search builds on 2017’s Now Playing feature where Pixel phones can recognize songs playing in the background without needing an internet connection. Work on that feature was later applied to Sound Search, which is server-based and works faster.

The key difference was moving the technology from only recognizing recorded audio to identifying hummed, sung, or whistled recordings. When you hum to Search or Assistant, Google generates a “number-based sequence representing the song’s melody” that ignores instruments and the quality of the voice. It’s then compared to the melody in recorded works, with Hum to Search results featuring match percentages.

To enable humming recognition, the network should produce embeddings for which pairs of audio containing the same melody are close to each other, even if they have different instrumental accompaniment and singing voices.

The current database includes over 500,000 songs, with Google touting a “high level of accuracy.” For comparison, Sound Search can sift through 100 million songs. This corpus is continually being updated, with there being “room to grow to include more of the world’s many melodies.”

During the creation of Hum to Search, Googlers sent in “clips of themselves singing or humming” through an “internal singing donation app.” Google’s blog post goes into depth on the training process and data, as well as machine learning improvements.