Production Expert

View Original

The Role Of Machine Learning In Developing Hush Pro

We’ve featured the Hush Pro noise reduction plugin a few times now on the site, including the recent article ‘Using Hush Pro To Reduce Reverb And Noise In Dialogue Recordings’ in which I demonstrated a few usage examples for Hush Pro.

Hush Pro is a Pro Tools AudioSuite version of the standalone Hush app. Both Hush and Hush Pro are built for Apple Silicon Macs and use a machine learning model to differentiate between voice, reverb and noise, giving users control over the balance, leveraging the power of the GPU to achieve extremely impressive results with minimal audible artefacts.

Hush and Hush Pro were developed by a solo developer, Ian Sampson, based in Vancouver, Canada. I spoke to Ian to find out more about the plug-in’s development, a little bit about the machine learning process and what the future might hold for Hush Pro. The topics covered are summarised below.

  • Background on the standalone Hush app and the Audiosuite plug-in for Pro Tools, Hush Pro. Both use a neural network which Ian has trained to separate dialogue from ambient noise, transient noise and room reflections.

  • Hush first came about after Ian did a PhD in English literature. During that time he worked on a book which used large language models to generate surreal poetry. That lead him to learning about the world of machine learning for audio, including some of the latest methods of noise reduction. At the same time he was trying to record some poetry for the book and battling against background noise in an apartment, including domestic appliances, birds and dogs barking next-door. This sparked the idea to develop a noise reduction app and sent him on a journey down down a rabbit hole of reading articles, doing research and teaching himself about the basics of machine learning. After many months of work, Ian released Hush. This received a lot of interest from the audio post community, including requests for a version which works in Pro Tools. From this, Hush Pro was born.

  • Insight into how the machine learning process works. In essence, this starts with cleanly recorded dialogue which is completely free from noise. With a very large number of recordings, various kinds of noise and reverb are then artificially added to the recordings, creating pairs of clean and noisy recordings. A neural network is then trained, using the noisy version as the input, to produce the clean one as the output. After running several hundreds of thousands of samples on a super computer, a machine learning model is eventually produced which can separate dialogue from noise and clean up the audio accurately and effectively.

  • The key factor which differentiates Hush Pro from other noise reduction plug-ins is the use of the GPU and Neural Engine on Apple Silicon. Hush Pro uses the accelerators on the chips, which are heavily optimised for the kinds of tasks that machine learning involves. This means that Hush Pro is able to run a model which is orders of magnitude bigger than it would be possible to run on a CPU. A bigger model like this means a higher degree of nuance and accuracy, along with significantly fewer artifacts, and cleaner resulting audio.

  • The ups and downs of being a solo developer. It takes a long time to get things done due to the multiple roles which Ian takes on. In addition to actually developing the plug-ins, he also responds to emails and does all of the other tasks which a larger team might normally do. One key benefit to being a solo developer though, is the freedom. Very niche products can be developed which larger plug-in manufacturers might not ordinarily be willing to pursue. The case in point is the fact that Hush Pro is for post production, it runs only on Apple Silicon based Macs and it’s Audiosuite only.

  • The potential for a real time AAX version of Hush Pro in the future. Ian is working on this, and hopes that it will be ready for release at some point later this year. This should keep the existing quality, and have a useable latency of around 200ms.

See this gallery in the original post