Production Expert

View Original

AES Update Loudness Recommendations For Audio Only Streaming

Back in 2015, AES grasped the nettle and produced a pivotal document to try and provide some loudness recommendations for audio-only streaming, to complement the broadcast standards already in use. 6 years on and they have published an updated set of recommendations to supersede the 2015 audio streaming recommendations. In this article, our own loudness guru takes a look at the new recommendations to see what has changed.

I agree with one of the authors, Rob Byers who said in his Linked In post, that this document “will have an impact on music streaming, podcasting, radio streaming, dynamic ad insertion, and even virtual assistants and it's intended to give all of us a better listening experience, no matter how or what we listen to.”

The Background

Back in 2015, there has been some discussion on what loudness to use for audio streaming content on platforms like iTunes Radio, Spotify and YouTube etc.

At the time there wasn’t a standard for audio streaming content, unlike broadcast workflows that settled on one standard, albeit with a number of delivery specs around the world based around -24 LKFS and -23LUFS and a maximum true peak of -1 or -2dBTP.

There has been a view for a long time that the broadcast standard of -23LUFS or -24LKFS would not be suitable for portable devices because there isn't enough gain in the headphone amps to deliver an acceptable volume.

As a result, a number of services including iTunes Radio settled on -16LUFS and here at Production Expert, we also settled on -16LUFS for our weekly podcast. However, YouTube went for -13LUFS and Spotify initially went for around -11LUFS using ReplyGain rather than BS 1770, although they subsequently moved to 14 LUFS, and now use the ITU 1770 standard according to their Loudness Normalization article.

To provide some recommendations for audio streaming content in 2015, the AES published Technical Document - AES TD1004.1.15-10 - Recommendation for Loudness of Audio Streaming and Network File Playback in which they recommended a window between -16 and -20LUFS with a maximum true peak of -1dBTP.

At the time I suggested that the window was on the low side, and with too wide a tolerance, effectively ±2LU. My view was it should be -16LUFS ±1LU giving a window of -15 to -17LUFS.

When it comes to the maximum true peak I was surprised about the -1dBTP limit. The EBU R128 distribution specs rightly recommend a maximum true peak of -3dBTP because lossy codecs, which are at the core of this type of content delivery, cannot handle peaks much above -3dBTP. If you have ever worked with tools like the Sonnox Pro Codec plug-in of Nugen Audio’s MasterCheck Pro you will find that peak levels above -3dBTP can still distort these codecs, and so my advice is to always use a true peak limiter set to -3dBTP.

What Has Changed Between TD1004 (2015) And TD1008 (2021)

My first piece of advice is that if you are delivering audio content to audio-only streaming services is that you should read the AES TD1008 Recommendations in full and that you will probably need to read it more than once, as there is a lot of advice and explanation integrated within the latest recommendations and hats off to the authors for not just producing a set of guidelines but providing the background and explanations as to why they have come to these recommendations.

What we share here in this article is not intended to replace reading these latest recommendations, rather it should be used alongside the recommendations.

To start with the AES has made it clear what the aims of their recommendations are…

  • It is intended for use by distributors of Internet audio streams and on-demand audio files.

  • It is not intended for sound-with-picture content (Over-The-Top, or On-Demand Video). Guidelines for that material are covered in other industry recommendations and standards (e.g., AES71- 2018).

You might be wondering why we are focusing on these recommendations if they are for distributors? The AES goes onto to say that although this document “does not provide recommendations for content production. However, content creators and producers will find it essential to their work”.

They also acknowledge that this is an evolutionary process “to accommodate the inadequate maximum gain and limited metadata capability of some current and older playback devices.”

The AES make it clear that the endpoint of this evolutionary process is a desire to bring the audio streaming loudness standards in line with the broadcast and OTT standards of -23/24LUFS as devices are designed and built with sufficient gain in the playback chain and also support extended metadata. However, we are not there yet. In the 2021 recommendations, the AES is now looking at a window between -14LUFS for the loudest track on an album and -18LUFS for content involving speech.

Speech And Music Are Not The Same Loudness

There is a detailed table that lays out different targets and tolerances for different types of audio-only content. One of the standouts, for me, in TD1008 is the section headed Speech Vs Music in section 5 on Loudness and Normalisation.

“Numerous independent tests and studies have concluded that adjusting the speech portions of audio content to a consistent loudness leads to greater listener satisfaction. However, formal tests with listening panels showed that speech normalized to the same BS.1770 Integrated Loudness as music is typically perceived 2 to 3 dB louder than the music. Therefore, if operationally workable, the listener experience can be improved by normalizing music 2 or 3 LU higher than speech.

…it is additionally recommended that music be normalized to an average of -16 LUFS in operations where music and speech are separately normalized and played oout automatically. Music normalization can be implemented through either Album Normalization or Track Normalization.”

This idea that music can have a higher integrated loudness than speech, is an interesting one. I am yet to do some research of my own but the evidence they quote comes from 2 documents available from the AES Document Library…

When it comes to radio, which consists of a mix of music and speech there is a second table, which makes loudness recommendations for different genres…

  • News/Talk -18LUFS

  • Pop Music -16LUFS

  • Mixed Format -17LUFS

  • Sport -17LUFS

  • Drama -18LUFS

with a suggestion that providers can further refine these numbers based on the proportion of speech to music on their specific streams.

I don’t know about you but this seems to go against the concept that ITU-R BS 1770 and suggests that it is somehow flawed, that the integrated loudness of speech and music is different, that a 2LU ‘fudge factor’ is needed to compensate for the apparent deficiencies of the BS1770 algorithm.

Album Normalisation

The second standout from the TD1008 Recommendations is the use of Album Normalisation, as opposed to track normalisation. With track normalisation, all tracks are made equally loud. With album normalisation, just the loudest tracks of an album are made equally loud and the other tracks keep the relative loudness they had on their album. If one listens to an album, album normalisation makes the most sense. But when streaming, people do not just listen to albums as a whole but to randomly picked tracks in shuffled playlists. So the question is: does album normalisation work for a shuffled playlist too? The answer would seem to be yes.

For example, one piece of research was undertaken in cooperation with TIDAL using a survey on 4.2 million albums from their catalogue. They compared track normalisation with album normalisation but not listening to the same album, but a combination of tracks from a variety of albums in a shuffled playlist of 24 songs with 38 subjects. It turned out that 80% of the subjects preferred album normalisation, even though the tracks that we used had a significant difference in loudness, of up to 10 LU.

Mono Compatibility

The last standout from these recommendations is the time and detail spent on single channel (mono) production, distribution and playback. Again I am not going to go into the detail of what is discussed and recommended in TD1008 other than to say that as someone who is very old-school, when it comes to mono compatibility, it is very gratifying to see the care and consideration given to this issue and the appropriate use of a 90 deg phase shift when combining a stereo signal into a mono signal for use with single speaker playback.

There Is Much More…

These are my 3 standouts from the latest TD1008 Recommendations for Loudness of Internet Audio Streaming and On-Demand Distribution but the document goes into much more detail and covers a broad range of topics and provides a high-quality education into the whys and the wherefores of loudness and why they have chosen the standards offered in the recommendations. If you are producing content for audio streaming services then I do counsel you to read it in full.

Don’t Ignore It

I also hope that these recommendations from the AES make it clear that no one in the pro audio business, creating content, can ignore loudness and LUFS anymore. It matters, and whether you like it or not, it is the measurements that are being used now for virtually all methods of delivering creative content to the consumer.

This also goes for all the streaming service providers. Ultimately it’s down to you to implement these recommendations, the AES cannot enforce these.

See this content in the original post