Sound is an information dense data type, with 83% of Americans ages 12 or older listening to terrestrial radio in a given week in 2020. Sound can be classified into three categories: speech, music, and waveform. Audio retrieval systems are used for searching and monitoring online media in real-time to prevent intellectual property infringement and classify audio data. Feature extraction is crucial for audio similarity search, with deep learning-based models showing lower error rates than traditional ones. Milvus, an open-source vector database, can efficiently process feature vectors extracted by AI models and provides various common vector similarity calculations. The article demonstrates how to use an audio retrieval system powered by Milvus for non-speech audio data processing.