How AI Is Changing Music Production?

February 6, 2024 Mike Thornton

In the first of two articles, Mike Thornton takes a deep dive into how artificial intelligence is changing music production, looking at how AI can help us be more efficient, the impact of AI on the creation of royalty-free music as well as the challenges that AI poses with regard to copyright, creativity and careers. In the second article, Mike will examine how artificial intelligence affects audio post-production.

We have already seen how generative AI is having a growing impact on text generation, code building and image creation. It's not perfect. In fact, it is far from perfect, but it is getting better. It is clear that large language models like ChatGPT are not always accurate. We have seen images of people having the wrong number of fingers. Although AI is still not that intelligent, if you use it as an assistant whose work you have to check and ask for some things to be reworked as they are not good enough, then artificial intelligence can be seen as a useful tool. With this in mind, how is artificial intelligence impacting music production?

The Potential Benefits Of Using AI in Music Production

A growing number of AI music platforms are emerging, including Meta’s Audiocraft, OpenAI’s MuseNet, Soundful, Soundraw, Boomy, Amper and Loudly, among others, offering the ability to compose music quickly and efficiently. A report on the growth of generative AI in music production says…

“Composing traditional music can be a time-consuming process, requiring hours of creativity and a comprehensive knowledge of musical theory. With the advent of generative AI, however, musicians and composers can now leverage the power of algorithms to create music in a fraction of the time. Algorithms utilising Generative AI are capable of analysing existing music libraries, learning patterns, and generating new compositions that are in line with particular styles or genres. This capability enables composers to investigate new musical territories, experiment with various musical concepts, and generate high-quality music at an unprecedented rate. With generative AI, the creative possibilities are limitless, driving musical innovation to its limits.”

Creating Music Tracks With AI

Both Google and Meta are already getting on this bandwagon. MusicLM from Google and MusicGen from Meta are described as experimental AI tools designed to generate melodies from text based descriptions. Both tools are in beta at the time of writing, and although anyone can try out MusicGen from Meta as its completed opensource, you have to sign up and wait for an invite to be a part of Google’s AI Test Kitchen.

It's early days yet. Google says…

“It generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption.”

Meta says of MusicGen…

“We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output.”

Although Meta has trained the MusicGen models on 30-second chunks of audio, it is possible to generate longer sequences with a simple windowing approach. For example, using a fixed 30-second window and sliding the window by chunks of 10 seconds, keeping the last 20 seconds that were generated as context, apparently, it is possible to generate 2-minute-long tracks.

Another angle on this is in producing more personalised advertising. Production companies can use AI to produce music in a range of styles and then, based on information being gathered about us individually from our choices online, deliver personalised adverts using music styles they know will appeal to us.

Image courtesy of Steve Johnson

A marketresearch.biz report into Generative AI In Music Market identified a number of trends…

“Artificial intelligence-powered virtual bands and DJs - Imagine a world in which virtual bands and DJs perform on stage, mesmerizing audiences with their original compositions and electrifying performances. This is no longer a distant fantasy, but an AI-powered reality. With the assistance of Generative AI, virtual bands are able to compose original music, replicate the manner of renowned musicians, and even improvise in real time in response to audience reactions.

Background music for creators of content - Whether it's a captivating video or an engaging podcast, content creators aim to provide their audience with compelling experiences. Background music is a crucial element in producing immersive content.

The way that content creators find and use background music is being revolutionized by generative AI. With AI algorithms analyzing immense music archives, creators can now find the ideal soundtrack that complements and enhances their content. The importance of AI-generated music suggestions based on the content's tone, mood, and cadence in creating a seamless, immersive experience for the audience cannot be overstated.

Integration of AI-Generated Melodies with Lyrics - The integration of lyrics with AI-generated melodies is one area in which this collaboration has made a significant impact. In the same way that Generative AI can compose original compositions, it can also analyze large lyric databases to recognize patterns, rhythms, and narrative techniques.

Applications Involving Music Therapy - In addition to its creative and entertaining qualities, music has long been acknowledged for its therapeutic value. It can improve temperament, reduce tension, and even aid in physical and mental recuperation. With the incorporation of Generative AI, music therapy applications are undergoing a remarkable transformation that opens up new healing and well-being avenues.”

Creating Stems From Mixed Tracks

This area of machine learning and AI has been around for some time now. Back in September 2017, we showed how iZotope’s RX Music Rebalance module could be used to rebalance a mix by separating the mix into four areas: vocal, bass, percussion, and others. In addition, we showed how it could be used to isolate a vocal.

In October 2018, we tested XTRAX STEMS 2 from Audionamix, which used cloud servers to undertake the processing, splitting a mix into three elements: vocals, drums and everything else. Dan Cooper concluded…

“Is it finally possible to remove a vocal from a mixed song? Yes and no. I believe XTRAX STEMS 2 gets us closer to achieving this but there is still a way to go until the quality of the separation is improved… but I don’t believe that this type of technology will ever produce “clean” un-artefacty results… but time will tell I suppose. XTRAX STEMS 2 definitely makes the process easier and faster than ever before, plus we get the added bonus of isolating drum tracks as well.”

In December 2019, we explored Spleeter, an open-source unmixing software package, and it is Spleeter technology that is powering iZotope’s Music Rebalance module.

In August 2020, iZotope released RX8 with an improved Music Rebalance module, enabling users to remove or isolate vocals for a remix or even create and export new stems for further processing and mixing. In my review, I said…

“Although the Music Rebalance module is optimised for music and singing rather than the spoken world, the improvements in what Music Rebalance can do is astounding. We are very close now to being able to produce completely clean stems from a mixed track, as I show in the demo video.

That’s not to say that the post community haven’t been left out because, for starters, the improved Music Rebalance module will be very helpful for us especially when we need to rebalance a track to enable it to underscore better, or just to be able to rebalance the VO on a mixed track.”

In July 2022, we tested Hit’n’Mix’s RipX DeepAudio, and Luke concluded…

“There’s no doubt that RipX DeepAudio turns in results that will make even the most hardened cynic sit up and listen. In our test, it was able to generate a very usable mix-minus with convincing vocal extraction, as well as the potential for clean stem extraction with some final surgery. Yes, there are some artefacts in isolation, but when remixed, the resultant audio knits back together very nicely. While sometimes it will be unable to discern between instruments with worst-case audio, under even slightly favourable conditions, it can turn in results rivalling the very best mute or solo buttons in the world!”

In December 2022, we explored what LALAL.AI could do. This incarnation of LALAL.AI used their Phoenix AI, a neural network designed to isolate stems faster than before and claimed to provide better vocal separation quality than all other AI-based stem splitters on the market. So confident were they in their tech that LALAL.AI has published data showing how well theirs fare against a different service. You can read more about Phoenix in greater detail here, covering both its conception and some of its MO.

You can watch this video as we use LALAL.AI to extract audio stems and tracks from a video file. We upload into its simple web-based UI before following the generated links to download the extracted audio. We then bring these in alongside the original video to see how their quality compares with its original audio.

Now, LALAL.AI also has its 4th stem spitting technology, Orion as well as its 3rd gen Phoenix technology. They describe Orion as…

“A next-generation vocal remover and music source separation service for fast, easy and precise stem extraction. Remove vocal, instrumental, drums, bass, piano, electric guitar, acoustic guitar, and synthesiser tracks without quality loss.”

Here is an example…

Your browser doesn't support HTML5 audio

Original Sample

Your browser doesn't support HTML5 audio

Instrumental

Your browser doesn't support HTML5 audio

Vocals

Stem splitting technology made headlines when The Beatles released ‘Now And Then’ because they were able to split John Lennon’s demo recorded back in 1977, extract John’s original vocal and use it in the version of the song released in December 2023.

It is not clear exactly what software was used. The press release referred to…

“A software system developed by Peter Jackson and his team, used throughout the production of the documentary series Get Back, finally opened the way for the uncoupling of John’s vocal from his piano part.”

The audio restoration for the song is credited to ‘Park Road Post Production’.

Very soon after the Beatles song was released, Youtuber Rick Beato released this video where he used LALAL.AI to split his own demo recording of ‘Now and Then’.

You can try out LALAL.ai for free. However you will find that very quickly, you will need to choose a package, which gives you a varying number of minutes.

If you are looking for a lower-cost solution, there is a free service, Vocal Remover. This application is less powerful than paid-for services and can only separate voice from music using AI. You get two tracks - a karaoke version of your song (no vocals) and an acapella version (isolated vocals).

A New Type Of DAW?

In a report on the Andreessen Horowitz website, Justine Moore and Anish Acharya, in their article, The Future of Music: How Generative AI Is Transforming the Music Industry, expect the future of royalty-free music will be almost entirely AI-generated.

“This genre has already become commoditized, so it’s not hard to imagine a world where all background music is created by AI, and we break the historical tradeoff between quality and cost.”

They recognise that the early adopters of these products have largely been individual content creators and small to medium businesses…

“However, we expect these tools to move upmarket, in terms of both traditional enterprise sales to larger companies like games studios, and embedded music generation in content creation platforms via APIs.”

They expect to see a tight link between hardware and software…

“including the rise of ‘generative instruments,’ which may be DJ controllers and synthesizers that embed these ideas directly into the physical product.”

They go on to suggest that new products are being developed aiming to reinvent the DAW using an AI-first approach…

“making it more accessible to a new generation of consumers and professionals alike. Many of the most popular DAWs today are 20+ years old; startups like TuneFlow and WavTool are tackling the ambitious challenge of building a new version of the DAW from the ground up.”

This might seem a little far-fetched, but check out what TuneFlow say on their website…

“Your private suite of AI superpowers for music making, packed in a professional yet easy-to-use DAW. TuneFlow helps you achieve your music dreams, regardless of your level. While you can easily access most of what TuneFlow has to offer from a browser, it also comes with a cutting-edge editing engine on Desktop, allowing you to perform professional tasks like multi-track mixing, mastering, or drawing automation envelopes for your audio effects. With TuneFlow Desktop, you get a full-spec, industry-leading audio engine that supports all your favourite VST/VST3/AU plugins.”

In conclusion, Justine Moore and Anish Acharya finish with their ultimate dream of an AI-powered DAW...

“An end-to-end tool where you provide guidance on the vibe and themes of the track you’re looking to create, in the form of text, audio, images, or even video, and an AI copilot then collaborates with you to write and produce the song. We don’t imagine the most popular songs will ever be entirely AI generated — there’s a human element to music, as well as a connection to the artist that can’t be replaced — however, we do expect AI assistance will make it easier for the average person to become a musician. And we like the sound of that!”

The Challenges Presented From Using AI In Music Production

Image courtesy of Possessed Photography

But it's not all good news regarding the application of AI in music production. There are some significant challenges that will need to be faced.

Copyright Issues

It is inherently made clear with traditional cover versions that the cover artist isn’t the original performer. With generative AI, working ‘in the style of’, when it isn’t made clear that it isn’t the actual artist, we slide into the area of deception.

For example, there is a reasonable expectation when it comes to someone’s voice, whether spoken or sung: if you hear my voice, you understand that it is me speaking, and what you hear me saying is my knowledge, thoughts, and opinions. When it comes to AI-generated voices, it goes beyond the copyright issue of my actual voice. It moves into identity thief, where someone using AI to clone my voice is impersonating me and is creating content that appears to be expressing my knowledge, thoughts and opinions with the intent of deceiving people into thinking it is me.

Similarly, when it comes to singers, people using AI to clone performers' voices is nothing less than deception. This cannot be morally, ethically and legally acceptable.

In the Market Research Generative AI In Music Market Report, when it comes to copyright issues, the author says…

“The complex landscape of copyright laws and regulations is one of the primary restraining factors for the Generative AI in Music Market. As artificial intelligence technology continues to evolve and create original musical compositions, the issue of ownership and intellectual property rights becomes increasingly murky. This causes concern among musicians, composers, and industry participants, who fear that their work will be plagiarized or devalued. Consequently, legal challenges and copyright disputes arise, impeding the development and adoption of generative artificial intelligence in music.”

The This-Is-Music-2023-Economic-Report says that the downsides are real. It goes on…

“The opportunity for piracy on an industrial scale needs to be both understood and tackled. I was taught and apply a ‘contextual’ rather than ‘black letter’ reading of copyright… it is not a binary concept to merely prevent or control, but a highly flexible, pragmatic, and commercial concept to reward and recognise creativity which is entirely consistent with AI technologies provided we take a considered and informed approach.

Therefore, we should ensure that laws look to protect image, personality and other economic rights that attach to creatives. Such image and personality rights do not enjoy specific legislative protection in the UK, but they do in the USA and many other major music markets around the world. We must also ensure that there is a thoughtful and balanced approach to understanding how AI analysis can be applied for good, but without instituting broad brush exceptions. One person’s view of ‘data mining’ is another’s view of outright theft. We must distinguish between corporate innovation, human artistry and machine engineered solutions in a considered and practical manner.”

People Being Put Out Of Work

Image courtesy of Markus Spiske

The Market Research Generative AI In Music Market Report picks up on the opposition from musicians, saying…

“The incorporation of generative AI into music upends conventional conceptions of music creation and challenges the established roles of musicians. This frequently results in opposition from traditional musicians who dread the automation of their craft and the potential loss of creative control. In addition, the perception that AI-generated music lacks human touch and emotional profundity fosters this opposition.”

The AI Music – Step to the Future or Fall of Creativity report picks up the issue of careers and skills…

“Imagine that you have trained for 20 years, worked with the best in the business - and are great at what you do. But an AI music generator or lyric creation tool can do it faster and, to some degree, maybe even better. Job displacement is something impacting almost all creative sectors with the rise of generative AI.

When AI can do a job fast, with fewer mistakes - why would you do it any other way? While many artists wouldn’t go this route, younger artists looking for quick popularity might be tempted. In the end, this would see a decline in the overall skills in the music industry.”

Generative AI Produced Content Is Bland And Lacks Originality

Another obstacle for Generative AI in the music production sector is its limited capacity to infuse compositions with emotional expression. The Market Research Generative AI In Music Market Report says…

“While artificial intelligence algorithms excel at analyzing patterns, harmonies, and melodies, conveying the complexity and nuance of human emotions remains difficult. As an art form that frequently evokes profound human emotions, music necessitates an intricate comprehension of the human experience.”

The AI Music – Step to the Future or Fall of Creativity report picks up on the issue of a lack of originality…

“Generative AI could be like Ouroboros - the snake that eats its own tail. We have seen examples of what AI-generated music sounds like through things like the ‘fake artists’ on Spotify and other platforms. Many journalists and music lovers noticed that the AI-generated tracks sounded almost identical. Over time, heavy use of AI in music could lead to a massive decline in music diversity.”

The same report goes on to raise another issue, the bias that is inherently introduced into content created by generative AI, the area of bias…

“Generative models learn from the input they get, and one thing that has been prevalent over the last few years is that destructive and harmful information can be fed into the data pool. That then produces content with a set of biases, which can be reflected in the works it produces.”

In Conclusion

There you have it: a look at some of the benefits and challenges of using AI in music production. There is no doubt in my mind AI is here to stay and will continue to change almost everything we do. The benefits of AI as an assistant are clear. But how do we make sure that people’s identity is protected? How do we make sure that intellectual property is protected when we hear stories that generative AI is being trained on people’s creativity without permission? These are the big questions we all need to address. If we stay quiet and do nothing, people are going to be taken advantage of.

What About You?

How do you see AI affecting music production? Do share your thoughts constructively in the comments below. Do you have experience with how AI is changing music production? If so, do share them in the comments below.

What About Post Production?

In the second of two articles, Mike Thornton will take a deep dive into how artificial intelligence is changing the audio post-production sector.

See this content in the original post