What's that Song Called? Building a Hum-to-Search Music Recognition App with Vector Search
The author developed a Hum-to-Search music recognition app that identifies songs based solely on the hummed melody using vector search on audio embeddings in Astra DB. They initially tried pre-existing audio embedding models like panns_inference and OpenL3, but found success by standardizing all the audio into MIDI format. The application involves converting audio signals into a visual spectrum of frequencies (spectrogram) and feeding it into a deep neural network that outputs a learned low-dimensional vector representation for that audio. This numerical representation is used to find the closest match to a hummed audio embedding through vector similarity search. The author also shared their learnings while building this app, including using Spotify's basic pitch for MIDI translation and creating a normalized note histogram as a vector representation of audio. They also provided instructions on how to build this application and improve its performance by implementing pitch correction, utilizing dynamic time warping (DTW), and conducting extensive testing on MIDI outputs. The source code is available on GitHub for further contributions or use as a foundation for various projects.
Company
DataStax
Date published
Aug. 26, 2024
Author(s)
Sri Bala
Word count
1187
Language
English
Hacker News points
22