Researchers at Google have revealed a text-to-music AI that creates songs that can last as long as five minutes.

Releasing a paper with their work and findings so far, the team introduced MusicLM to the world with a number of examples that do bear a surprising resemblance to their text prompts.

The researchers claim their model “outperforms previous systems both in audio quality and adherence to the text description”.

The examples are 30-second snippets of the songs, and include their input captions such as:

  • “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls”.
  • “A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and awe, while being danceable”.
  • “A rising synth is playing an arpeggio with a lot of reverb. It is backed by pads, sub bass line and soft drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. It may be playing at a festival during two songs for a buildup”.

Using AI to generate music is nothing new – but a tool that can actually generate passable music based on a simple text prompt has yet to be showcased yet. That is until now, according to the team behind MusicLM.

The researchers explain in their paper the various challenges facing AI music generation. First there is a problem with the lack of paired audio and text data – unlike in text-to-image machine learning, where they say huge datasets have “contributed significantly” to recent advances.

For example OpenAI’s DALL-E tool, and Stable Diffusion, have both caused a swell in public interest in the area, as well as immediate use cases.

According to Source of photo: internet