Production Expert

View Original

Dialogue Cleanup - AI Versus Audio Professional

In this listening test, Paul Maunder considers where we currently are with A.I/ machine learning noise reduction tools and whether a human can still get better results by manually operating software and making informed decisions along the way. Take a listen to the results and vote for which you think sounds the best on dialog cleanup.

UPDATE - the results are in, see link at the bottom of this article.

There are plenty of ways to reduce noise in dialogue recordings, with several tools now available which claim to use some form of ‘A.I’. Is it really A.I or is it simply the result of machine learning? The two are closely related but not the same. That’s probably a debate for another time but one thing is certain, we have options with regards to noise reduction tools. Some require a lot of manual operation from the user and others are pretty much automatic. The question is: where are we currently with these A.I/ machine learning noise reduction tools and can a human still get better results by manually operating software and making informed decisions along the way? In this test we’ll be addressing this question, specifically with some noisy dialogue recordings.

We’ve selected 4 different tools for this test, two of which require manual intervention and operation. The other two are automatic. Let’s take a look at the contenders.

Adobe Enhance Speech

This online tool from Adobe is part of the Adobe Podcast beta and allows users to upload dialogue recordings in WAV or MP3 format. Recordings can be up to 1GB or 1 hour in duration. Adobe state that “Speech enhancement makes voice recordings sound as if they were recorded in a professional studio”. Processing time varies depending on file duration (and also seems to vary at different times of day) but generally happens slightly faster than real time. Adobe mention that some files could take up to 10 minutes but this is obviously dependant on file duration. Once processed, the resulting audio can be auditioned within your web browser and downloaded. Downloaded audio files are always in WAV format with a 32 bit floating point bit depth. As well as enhancing the audio, the processed files also come out close to -23LUFS in level.

Descript

Descript is an application designed for video editing, podcasting, screen recording and transcription. The particular feature we’re focussing on in this test is called Studio Sound. The manufacturer says “Get studio-quality sound. Even if you’re a beginner, or don’t have a fancy mic, you can sound like a pro with noise removal, speech enhancement, and other one-click sound effects”. In practice it’s not quite one click because you have to first select your audio, click on the + icon in the audio effects section, go to the Audio Repair category and then select Studio Sound. Applying the effect only takes a few seconds and, once applied, you can adjust the intensity percentage slider if required. In this test I chose to keep the intensity at 100% to minimise user intervention and get closer to the one click claim. On export, I opted to normalise the volume to -23LUFS so a fair comparison can be made with the Adobe product.

iZotope RX 10 Advanced

A very familiar set of tools for many of us, iZotope RX has formed an important part of noise reduction workflows for several years now. RX does incorporate some processing modules which were developed with the aid of machine learning, including Dialogue Isolate. Nevertheless, the overall operation of the software, along with decision making about how best to apply settings and which modules to use is down to the user. It requires you to actually listen and use judgement in order to get the best results which is why its inclusion in this test will allow us to compare this as a human operated piece of software against the much more automatic Adobe and Descript products. On the product page for RX10, iZotope say “Trusted by professional audio engineers to quickly and reliably deliver clean sound. Things break. So does audio. Recorded sound is rarely perfect— in fact, it's often in really bad shape. That's why you need RX, the industry standard for audio repair that helps restore, clean up, and improve recordings in post-production, music, and content creation.”

Acon Digital Acoustica Advanced

A cost effective alternative to iZotope RX, Acoustica from Acon Digital has fared well in our previous noise reduction and reverb reduction tests. As with RX, Acoustica does include some modules which came about through machine learning. Extract: Dialogue is an example of this and is essentially the equivalent of iZotope RX Dialogue Isolate. Of course, it includes many more modules as well, which cover a variety of different noise reduction tasks plus compression, limiting, EQ and loudness measurement. You can read more about it in our article iZotope RX user tries Acon Digital Acoustica. On the product page for Acoustica, Acon Digital say “Acoustica is our solution for audio editing, post-production, podcast creation, mastering and audio restoration – with no compromises when it comes to audio quality and workflow.”

The Test

For this test, which you can vote on the results of below, four pieces of problematic audio were used. Each is a dialogue recording with excessive background noise and other problems.

Each of the four clips was processed with the two automatic tools: Adobe Speech Enhance and Descript. They were also processed manually with iZotope RX 10 Advanced and Acon Digital Acoustica. Decisions were made on a clip by clip basis as to which modules were the most effective in each case. I was able to process most clips fairly quickly but some proved more tricky and so I worked to a time limit of 15 minutes maximum per clip. 

Some of the processing modules used for this test in iZotope RX included (in no particular order) DeWind, DeClip, Spectral Repair, EQ, Dialogue Isolate, Leveler, Loudness Control.

The processing modules used within Acon Digital Acoustica (again in no particular order) included Equalise 2, DeNoise 2, Extract: Dialogue, Dynamics, Equalise, Limit.

Each processed clip was standardised in loudness to have an integrated level of -23LUFS. Please listen carefully to the clips below and vote on which one from each test you think is the best result. The clips have been randomly labelled A to D on a test by test basis.

Clip 1

Clip 1 was made on an iPhone whilst walking down the beach. You can hear waves and footsteps along with some handling/wind noise. We asked Russ to simulate taking notes for an article.

Your browser doesn't support HTML5 audio

Clip 1 (Original)

Your browser doesn't support HTML5 audio

Clip 1 A

Your browser doesn't support HTML5 audio

Clip 1 B

Your browser doesn't support HTML5 audio

Clip 1 C

Your browser doesn't support HTML5 audio

Clip 1 D

See this content in the original post

Clip 2

Clip 2 is from a presentation to a small audience. There is some background noise and the recording is very loud and distorted so take care when listening to it.

Your browser doesn't support HTML5 audio

Clip 2 (Original) WARNING - This One Is Loud!

Your browser doesn't support HTML5 audio

Clip 2 A

Your browser doesn't support HTML5 audio

Clip 2 B

Your browser doesn't support HTML5 audio

Clip 2 C

Your browser doesn't support HTML5 audio

Clip 2 D

See this content in the original post

Clip 3

Clip 3 was recorded in a busy lobby area and has a lot of background noise. It’s also clipped.

Your browser doesn't support HTML5 audio

Clip 3 (Original)

Your browser doesn't support HTML5 audio

Clip 3 A

Your browser doesn't support HTML5 audio

Clip 3 B

Your browser doesn't support HTML5 audio

Clip 3 C

Your browser doesn't support HTML5 audio

Clip 3 D

See this content in the original post

Clip 4

Clip 4 was recorded on a phone in a reflective room with very excessive air conditioning noise present. The speaker is too far away from the microphone.

Your browser doesn't support HTML5 audio

Clip 4 (Original)

Your browser doesn't support HTML5 audio

Clip 4 A

Your browser doesn't support HTML5 audio

Clip 4 B

Your browser doesn't support HTML5 audio

Clip 4 C

Your browser doesn't support HTML5 audio

Clip 4 D

See this content in the original post

The results of this vote will be published very soon, revealing whether or not ‘A.I’ can more effectively reduce noise and improve speech recordings than a human operator.

UPDATE - the results are in, find them here.

See this gallery in the original post