Dialogue Cleanup - AI Versus Audio Professional - The Results

March 20, 2023 Paul Maunder

We recently ran a listening test comparing the effectiveness of A.I/ machine learning noise reduction tools with the results achieved by an audio professional using manually operated software. You can find the original test here.

Four dialogue recordings were included in the test and each one was processed a total of four times using the following software:

Adobe Speech Enhance (A.I/ machine learning)
Descript (A.I/ machine learning)
iZotope RX 10 Advanced (Manually operated)
Acon Digital Acoustica Advanced (Manually operated)

The Production Expert community was invited to vote on which results you felt were the best. The results are as follows:

Clip 1

Clip 1 was made on an iPhone whilst walking down the beach. You can hear waves and footsteps along with some handling/wind noise in the original recording. We asked Russ to simulate taking notes for an article.

A was Acoustica

B was Adobe

C was Descript

D was RX

Clip 1 winner: Descript

Clip 1 runner up: iZotope RX

Clip 2

Clip 2 was from a presentation to a small audience. There was some background noise and the recording was very loud and distorted.

A was Adobe

B was Descript

C was Acoustica

D was RX

Clip 2 winner: Descript

Clip 2 runner up: Acon Digital Acoustica

Clip 3

Clip 3 was recorded in a busy lobby area and had a lot of background noise. It was also clipped.

A was Descript

B was RX

C was Adobe

D was Acoustica

Clip 3 winner: Descript

Clip 3 runner up: Adobe Speech Enhance

Clip 4

Clip 4 was recorded on a phone in a reflective room with very excessive air conditioning noise present. The speaker was too far away from the microphone.

A was RX

B was Acoustica

C was Adobe

D was Descript

Clip 4 winner: Adobe Speech Enhance

Clip 4 runner up: Descript

Results Summary

Based on the results on this test, as voted for by the Production Expert community, the most effective tools for achieving clean dialogue recordings were the following:

Clip 1 - Descript

Clip 2 - Descript

Clip 3 - Descript

Clip 4 - Adobe Speech Enhance

What Do The Results Tell Us?

Looking purely at the results it would appear that a human audio professional manually operating noise reduction software just cannot get close to the results which A.I/ machine learning tools can now achieve. Descript won 3 out of 4 tests and Adobe Speech Enhance won the other, with both iZotope RX and Acon Digital Acoustica only taking second place in one test each.

Certainly, the results achieved by the automatic Descript and Adobe products are very impressive indeed. In the fourth test I simply couldn’t get close to the sound achieved by the A.I tools. The way they were able to not only reduce the noise but also create the effect of the person being closer to the microphone is difficult or maybe impossible to achieve through other means. Also, the results that the A.I/ machine learning tools accomplished was done much more quickly than I was able to do manually.

One thing to consider here is how the resulting dialogue recordings might actually be used. If you want something to sound like it was recorded in front of a studio condenser microphone in a controlled environment then the Adobe and Descript tools are excellent. However, they do change the ‘perspective’ of the audio by creating a more close miked sound. If you’re cleaning up dialogue for film or TV, this might not be desirable when you simply want to reduce noise but leave the overall tonal quality otherwise unchanged. In such cases, the iZotope and Acon Digital tools are a better choice.

Also, having used it on a couple of projects now, I’ve noticed that Adobe Speech Enhance seems to re-synthesise the voice which can sometimes create a slightly unnatural sound which can seem different to the actual voice of the original speaker. Nevertheless, the results are still astonishing and, as the results of this test show, A.I and machine learning tools are starting to bring about possibilities in audio which we could only have dreamt of just a few years ago.

So, where are we on the journey into a world of A.I based tools? I suspect that we’re barely at the start and that we might be on the brink of a huge acceleration in not only what such technologies can achieve but also in the widespread uptake of them and their incorporation into numerous everyday tasks, not just in audio production but in many aspects of day to day life. Is this a good thing or will we end up with a Skynet situation in a few years where the software becomes self aware and tries to destroy us? Hopefully not! Let us know what you think in the comments.

See this content in the original post