Company
Date Published
Author
Marco Ramponi
Word count
4075
Language
English
Hacker News points
7

Summary

In recent years, the field of generative audio models has seen rapid advancements with several notable models being developed for both music generation and text-to-speech synthesis. We will discuss some of these key developments in this article. Music Generation Models: Text-to-Music Synthesis: A growing trend among AI researchers is the development of text-to-music generative models that can produce music based on natural language descriptions, akin to how text-to-image diffusion models work. One such model, MuLan, is a transformer-based model trained on an extensive dataset consisting of soundtracks from 44 million online music videos alongside their text descriptions. It generates embeddings for the text prompt and a spectrogram of the target audio. Once trained, MuLan can either take a piece of music as input and generate textual descriptions and attributes or it can take textual descriptions as input and outputs a representation of musical elements that align with the text. Music Generation Models: Generative Adversarial Networks (GANs): Another approach to music generation is through the use of GANs, which have been successfully applied in various domains for content generation tasks. For instance, GANSynth is a generative model that uses WaveNet as its discriminator and can generate high-quality audio samples of musical notes based on random noise inputs. Speech Synthesis Models: Text-to-Speech (TTS): In the field of TTS synthesis, several breakthroughs have been made over the past few years with models like VALL-E, NaturalSpeech 2, and Voicebox showcasing exceptional performance in terms of voice cloning and naturalness. These models leverage advanced architectures such as Latent Diffusion Models and Flow-Matching for non-autoregressive audio generation tasks. In summary, generative audio models have made significant strides forward in recent years with various innovative approaches being explored across different subdomains within this field.