Production Expert

View Original

What Are Custom HRTFs And Why They Matter?

In this article Julian looks at the significance of our own physiology on how well binaural renders translate to the listener and explores some potential solutions to this lack of translation for some listeners.

I used to run a music tech course and every year I’d be faced with a room full of new students with more enthusiasm than experience. There are a few audio crowdpleasers which I’ve found to be ideal material with which to grab the attention of people interested in audio who haven’t yet come across them. Mid-side processing, time aligning mics on a 4x12, centre channel removal using inverted polarity to remove a vocal. There are quite a few to choose from. However the best for getting an instant response was playing a binaural recording over headphones. Whoops of excitement would show me who was really “getting” it but more interesting to me were the people who were left underwhelmed. Why did some people experience an utterly convincing, immersive experience while some just heard a slightly odd stereo recording? The answer is the effect of Head Related Transfer Functions.

The basic mechanism by which we locate sounds from left to right is well understood. Sounds coming from the right reach the right ear first, are a little louder and the left ear is masked by the head being between that ear and the sound source making it sound slightly different. However this doesn’t explain how we can perceive sounds to be coming from behind or above us. For this we need to introduce the ways in which our unique physiology affects the sounds we hear.

Our ears are unique to us. The precise shape of the outer ear, the pinna, is unique to us as is the shape of our heads and torsos, all of which colour the sound our ears receive and our hearing systems become very finely tuned to our specific physiology. This information is known as a Head Related Transfer Function and can be captured and used to process audio in real time, much like convolution reverb using an impulse response.

Neumann Dummy Head Microphone

The way most binaural recordings are made is using a dummy head microphone. These mics are the size and shape of a real head and have detailed, anatomically accurate pinnae which, by diffraction and reflection, colour the sound and capture the spatial cues necessary so when replayed over headphones creates an immersive audio experience which goes far further than stereo. A good binaural recording can be unsettlingly realistic.

The thing I was seeing in that teaching space, with most people bowled over by the experience but some left unimpressed, was the issue of translation. If your physiology differs too much from the dummy mic, the effect is compromised. If the crucial summations and cancellations to which your hearing is so finely tuned don’t fall in the the right places, you are left with indistinct localisation and the 3D quality doesn’t occur.

Designers of rubber ears for dummy head mics (I’m sure that’s not their proper job title but someone has to do it) must decide on an average ear, and if your ears differ too much from this then your experience will be compromised.

Binaural audio used to be a niche area but as well as (slightly weird) AMSR videos on YouTube, Binaural has come to the attention of a whole new type of user because of the use of binaural in gaming and AR applications. If you can capture a personalised HRTF which is specific to your ears you will experience a more convincing effect. Plugins from companies such as Sparta, Noisemakers, SSA and Harpex all allow use of personalised HRTFs

HRTFs And Dolby Atmos

Then there is Dolby Atmos. While ‘proper’ Atmos involves listening on a speaker array of 7.1.4 channel widths and higher, and much of the film and TV Atmos content out there will be consumed using some extremely clever soundbars, almost all Dolby Atmos for Music content will be heard over headphones as a binaural rendered version. This binaural rendering has developed over the last few years and it is important to distinguish between Atmos and Apple’s Spatial Audio. The two are not the same and there are some issues with Spatial Audio and binaural which are specific to Spatial and not to Atmos.

So a way to capture personalised HRTFs is clearly desirable. The techniques usually used to capture impulse responses of acoustic spaces aren’t really practical for most users, involving significant equipment resources and operator skill. Instead a method for capturing the shape of ears using cameras is much more practical. In the same way as your iPhone can capture facial ID data, it can just as easily capture the shape of a person’s ears.

Genelec Aural ID

Genelec, no strangers to immersive audio, released Aural ID to meet this need. Rather than a product, Aural ID is a service, but unlike the necessity to visit a specialist for a fitting like you might do to get custom moulds made for IEMs, everything can be done remotely using a smartphone. This system of capture is based on a technique based on extracting 3D information from photographs know as photogrammetry. Using a smartphone camera taking pictures from every angle around the subject, including measurement pictures holding a ruler up to the ears so the scale can be accurately established, a very accurate 3D model of the subject is built up and this information is made available to the user in the form of a SOFA file (Spatially Oriented Format for Acoustics) - a format which has been defined and standardised by the Audio Engineering Society (AES).

Below is a video presentation to the AES on HRTFs and Aural ID from Genelec. It’s very detailed and quite long but if you want the detail, here it is.

Using a personalised HRTF adds to the realism and acuity of a binaural render but one of the significant differences between the experience of listening to Atmos content on speakers versus a binaural render is the fact that the sonic image moves with the listener’s head when listening on headphones. An issue which has been addressed by various implementations of head tracking, the system by which the binaural render changes in real time to keep sounds in a consistent position relative to the listener regardless of their head movements. Products such as Waves NX achieve this. I am yet to try that particular system so won’t comment on its effectiveness but it is clear that the separate technologies of custom HRTFs and head tracking will continue to come together as they have done already in systems such as Apple’s Spatial Audio which, when used with video content and the correct AirPods, incorporates head tracking.

Hardware And HRTFs

A product I came across recently was the Smyth Research Realiser. This hardware unit suggests a possible future direction of this technology, it runs a Personalised Room Impulse Response (PRIR) which is a custom HRTF made in a specific listening environment. This is very similar to the way HTRFs are captured but this is usually done in an anechoic space, giving as neutral a response as possible. The tying together of the environment and the listener makes it less flexible but from what I’ve read it seems to work very well indeed. The implementation of Custom HRTFs and head tracking in hardware to ensure negligible latency seems a natural progression and I’d be fascinated to see what might be possible if an AAX DSP plugin were available, or a UAD-2 plugin.

Below is a video from Smyth Research explaining their product.

If a zero latency, properly individualised custom HRTF and head tracking solution were available the justification for spending, by plugin standards, a considerable sum would be easily justified. The Genelec Aural ID service is €500, which some might think expensive but compared to the cost of even the cheapest Atmos monitoring system it is extremely cheap. As this area matures I look forward to seeing accurate, convincing Atmos headphone experience which rivals the experience of loudspeaker monitoring.

What do you think?

See this gallery in the original post