Big tech is making strides in audio-creative AI tools as Meta and Google unveil new generative artificial intelligence software tailored for music and audio creation.
Last week, Facebook’s parent company Meta unveiled a new generative AI tool called AudioCraft – a multi-model product which uses artificial intelligence to generate high-quality audio and music content from text prompts.
By typing a desired prompt into an trained instance of AudioCraft, users can generate convincing audio and wholly cohesive music without needing to do any of the manual legwork themselves.
“Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument,” said Meta.
“Or a small business owner adding a soundtrack to their latest video ad on Instagram with ease. That’s the promise of AudioCraft,” it added.
Meta demonstrated successful use-cases for the product with prompts as simple as “whistling with wind blowing” through to lengthy, multi-faceted text cues like “earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves”.
MusicGen exists for music generation and was trained with “Meta-owned” and specifically licensed music, while AudioGen specialises in audio generation and was trained on public sound effects.
Notably, Meta displayed AudioGen examples of AI-generated human speech layered against background sounds such as applauding crowds or faint sirens – although much of the speech was uncannily incoherent.
The third model, EnCodec, focuses on audio compression and allows higher quality music generation in smaller files.
Meta’s researchers explained this is made possible by identifying “changes that will not be perceivable by humans” and turning compressed signals back into a waveform as similar to the original as possible.
“Imagine listening to a friend’s audio message in an area with low connectivity and not having it stall or glitch,” researchers wrote.
“Our research shows how we can use AI to help us achieve this.”
Unlike exclusively pre-trained AI products such as ChatGPT, this means researchers and practitioners of the public can use AudioCraft to further train their own models with their own datasets – a distinction which is particularly important for creative fields.
“We're open sourcing the code for AudioCraft, which generates high-quality, realistic audio and music by listening to raw audio signals and text-based prompts,” said Zuckerberg.
“Having a solid open source foundation will foster innovation and complement the way we produce and listen to audio and music in the future,” said Meta.
In explaining its motivations for AudioCraft, Meta lamented that while there has been ample traction for generative AI in video, images and text, audio “has seemed to lag a bit behind”.
While existing AI audio tools have enabled creative projects and widespread deepfakes of public figures and musicians, Meta suggests current options are “highly complicated and not very open” by comparison to AudioCraft.
Developed in partnership with popular hip hop musician Lupe Fiasco, TextFX helps songwriters form the lyrics and theme of a track by generating new meanings, semantic correlations and exploration paths out of a given user input.
According to Fiasco – whose linguistic techniques were explicitly studied by Google during the formation of TextFX – the program enabled him to write a now-published song in just under two hours.
“We wanted to explore specifically how AI could expand human creativity,” said Creative Technologist at Google Creative Lab, Aaron Wade.
“TextFX targets creative writing, but what might it mean for AI to enter other creative domains as a collaborator?”
Meanwhile, governments are still discerning how to regulate AI given its unique nuances over how copyright applies to data used to train AI models – with the EU leading the charge via a recent draft law adopted by the European Union.