Bad Sound On TV & Movies - A Response

December 6, 2022 Neil Hillman

In this article, Dr Neil Hillman discusses new, shocking research that shows that 80% of some viewers rely on subtitles to 'hear' their television programmes and not because they are old or hearing impaired, quite the opposite. He also asks whether it has become the new fashion in television production to be completely ignorant of how sound works.

I daresay we have quite a lot to thank the Italian Vilfredo Federico Damaso Pareto for; but he is best remembered for presenting us with a simple but useful mathematical ratio that bears his name, one which went on to find widespread acceptance, and came to be applied in more ways than he might have ever envisaged.

Vilfredo Pareto (1848 - 1923) pictured in 1870

Born in Italy in 1848, Pareto was a notable philosopher and economist at the University of Lausanne; but the seed for the Pareto principle is said to have been sown when he noticed that 20% of the pea plants in his garden generated 80% of the healthy pea pods. This statistic caused him to think about whether there was a constant factor at work in uneven distribution, and one of the first things that he found to agree with this proposition was that approximately 80% of the land in Italy was owned by 20% of the population; something he disclosed in his first published work, in 1896. (Pareto, V. (1896) Cours d'économie politique Vol. I, Lausanne: F. Rouge) To his surprise, he also found this to be the case in a variety of other countries, too.

The 80 / 20 ratio is everywhere

From this, you may not be surprised to learn that roughly 20% of the world’s population enjoys 80% of the world’s income.

For some time, it has been an adage of business management that "80% of sales come from 20% of clients" – something I was taught in the early days of running my audio post-production facility, The Audio Suite, by my business coach David Holland. (A 20-year relationship so far, I note, which makes me wonder if by some abstract correlation to the Pareto principle, it means I’ll be working with him until I’m 80…)

In the 1980s, video rental stores found that 80% of their income came from 20% of the titles they stocked (Kleinfield, N. R. (1988), A Tight Squeeze at Video Stores, in The New York Times, 01-05-1988) and Microsoft have come to espouse two important ‘truths’: 20% of computer code has 80% of the errors, and, 80% of software code can be written in 20% of the allocated time (Pressman, R. S. (2010), Software Engineering: A Practitioner's Approach (7th ed.), Boston, Mass: McGraw-Hill)

However, there is a recent demonstration of the Pareto principle that as an audio professional who has just chalked up 40 years and reached 1,000 IMDb credits in film and television, I am shocked to the core by. In fact, I’ll go further, and say that I have a deep, burning shame from learning of it.

80% of viewers ‘listen’ through subtitles

In his deeply disturbing and very well-researched article, published here on Production Expert, TV Subtitle Usage Up To 80% - What Is Going Wrong with Dialogue Mixes? friend and respected fellow audio professional Mike Thornton shared some startling results. Although many of us working in audio on the production line of television drama were surprised, most of us would describe it as the inevitable outcome of the ignorance, apathy and greed of the television production industry today - in its attitude towards sound in general, and in its attitude towards dialogue intelligibility in particular.

The recent research quoted in Mike’s article shows that in some age groups, 80% of the television drama audience routinely reaches for the remote control to turn on the subtitles - and not because they are of ‘a certain mature age’, nor because they have an existing hearing problem; instead, it's simply because they cannot hear drama dialogue clearly enough and are now in the habit of switching on the captions. (Another Pareto alignment is that 80% of the issue is because background music drowns out the actor’s dialogue.)

See this chart in the original post

And so, because of this statistic, I want to suggest another term for this particular application of the Pareto principle: I propose that it should be known as ‘The Ratio of Deep Shame’.

But who is to blame for this disgraceful, and avoidable, situation of unintelligible dialogue? After all, what other industry delivers a product that is worse than it was 30 years ago, yet continues to charge its customers a premium price? Software? Computers? Automobiles? Mobile phones? Public transport?

It’s my opinion that there are two broad aspects to the answer: the first one concerns the wilfully unenlightened aesthetic of modern television drama production, the second is the use of labyrinthian technical standards, which would seem to be over-engineering what is, at its simplest, little more than a traditional TV sound challenge.

The inappropriate aesthetic of contemporary television drama

Here we are with some of the most marvellous tools we’ve ever had to create film and television sound with, both on location and in post-production; yet it’s not unreasonable to say that viewers are being served with soundtracks that are less intelligible now than they were three decades ago. But – and it’s this that really gets me – it appears to be increasingly fashionable to work in production and to be clueless about sound: as if the crass stupidity of not understanding even the very basics of the sound department’s considerations somehow elevates someone to a loftier artistic position; and I hate this mentality with a passion.

The prime example of this mindset is shown by movie director Christopher Nolan, who maintains that 'intelligibility is overrated'. Forgive me, but that's a pathetic standpoint to take for an audio-visual medium; and as evidence, I offer you Interstellar, Tenet or any of The Dark Knight trilogy as examples of how consistently dreadful Nolan soundtracks are (however, if you needed me to suggest the very worst, it would have to be the irredeemable The Dark Knight Rises). I can only assume that every student filmmaker and wannabe director has used this as their justification for being equally disinterested, ignorant and lazy as far as sound is concerned; and as they blindly mimic such mindless behaviour, it manifests more productions with inaudible performances, or incoherent direction, or both; and on it goes.

See this chart in the original post

It’s not always been this way, of course. Producer George Lucas has suggested that “Sound is where you get the most bang for your buck” (quoted in Blake, L. (2004) ‘George Lucas – Technology and the art of filmmaking’, at mixonline.com 11-01-2004); whilst Stephen Spielberg maintains that “The eye sees better when the sound is great.” (Barsam, R. & Monahan, D. (2009), ‘Looking at Movies: An Introduction to Film (3rd Edition)’, New York: W. W. Norton & Company.)

Whilst it’s the OTT streamers who are often being held up as the culprits for poor TV sound intelligibility, the traditional broadcasters in the UK, such as ITV and (as they would like us to believe, the last bastion of television excellence) the BBC, have proven themselves to be equally inept as far as dialogue intelligibility is concerned. Viewers have complained in their thousands, network News bulletins and national newspapers have covered numerous ‘Mumblegate’ stories, yet the audience at home is repeatedly left with no alternative but to turn on the on-screen captions.

When once asked by a Daily Telegraph journalist for my comment on a prime-time Christmas drama – where viewers had literally swamped the BBC switchboard with complaints over inaudible dialogue – I felt obliged to say that ‘For the avoidance of doubt, mumbled dialogue is not the audio equivalent of dark, moody pictures.’ (Hillman, N. quoted in The Daily Telegraph by Carpani, J. (2019).

As one frustrated viewer recently commented on social media, having to rely on reading the dialogue on-screen meant that they “might just as well have read the book”; and surely, you would have thought, even the most picture-centric directors would have worked out by now that if you’re looking at the subtitles, you’re not looking at their pretty, shiny pictures…

In their desire to be more cinematic with television drama, to bring enhanced realism to our screens, by allowing actors to mumble and swallow lines of dialogue for naturalistic speech, or by directing music and effects to swamp speech for effect, directors and producers are losing sight of some long-established and fundamental sound considerations:

Sound is 50% of the moving-picture entertainment product.
When it’s done properly, sound can create up to 70% of an audience’s involvement with the story.
When it’s used thoughtfully, sound can create up to 80% of an audience’s emotional engagement.
When it’s planned properly, sound typically uses only 10% of a blockbuster budget.

To disregard the creative power of sound is; therefore, you would think, ridiculous; yet a friend of mine – a very experienced Re-recording Mixer – contacted me only this week to say…

“I am mixing a drama series for [a broadcaster] at the moment. The original brief was that it was a dialogue-led show and to prioritise clarity. At the first episode’s dub they said they didn’t like the dialogue and wanted the dirtier off-mic sound, like they heard in the offline edit.”

[For picture editing purposes, a very rough, guide mix of the sound is produced on location, and supplied solely for the editor to cut pictures to. The clean recorded audio is synchronized and substituted in the audio post-production process.] I really, really felt their pain.

The inappropriate reliance on technical standard

Here's how to compound the problem: put a group of half-a-dozen sound engineers together in a room and ask them to tackle an audio challenge. What you’ll get is 6 ever-more convoluted solutions. Now consider the number of broadcast streaming channels that there are, and you’ll begin to understand why today there are so many different delivery standards for television sound. Amazon Prime is different to Netflix, is different to Apple, is different to Disney, is different to YouTube, is different to the BBC… And so it goes on. Yet figuratively, their soundtracks all emanate from the same kind of television set, in the same kind of living room, but in millions of different homes, and any deficiency in the original sound is only exacerbated by the various automated algorithms that are used to squeeze the maximum number of channels down the smallest possible pipe - data compression and intelligibility make uncomfortable bedfellows.

Something else that may seem an obvious point also appears to be overlooked: we don’t consume television in the same way that we do movies – theatrical-release films are mixed in an identically sized room and acoustic, to that which the audience will view the movie in. But television seems to have a chip on its shoulder... Because it also gets its sound mixed as if it was being watched in a cinema instead of on a saggy 3-piece suite.

It couldn't be more different: the background noise in a home is higher, attention is easily dragged away from the screen, television is not listened to as loudly as cinema, viewers are generally listening in a poor acoustic, they’re not all sitting in an audio ‘sweet spot’ and the reproduction equipment is often poor – at worst, the audience is relying on the built-in speakers on the back of the television! At best, it might be a ‘soundbar’ from the middle aisle of Aldi, or perhaps Argos, that gets plugged in.

For many years, when we mixed sound for television, we used big-ish speakers to determine the integrity and cleanliness of the audio signal, but constantly checked on little speakers to replicate the home listening experience. These speakers were horrible little things, of course, but they were just like the tellies of the day were using.

Today, one of the biggest streamers of the moment claims to have the tightest control on dialogue intelligibility (and supplies reams of paperwork that includes plenty of other audio compliance measures, too) and they suggest that ‘near-field monitoring’ is the solution. But this involves monitoring and mixing at a specified, precise distance, on a specified professional loudspeaker, at a specified single point. Unfortunately, nothing of that arrangement represents anything found in the domestic listening environment. I mean, why not add-in monitoring on a soundbar to the specs?

In principle, what’s the way forward?

The sound-for-television technical standards committees worldwide worked really well to introduce a loudness standard (in Europe EBU R128) that pretty much stopped commercials making you jump out of your seat (and your skin) because you’d turned the volume up to hear what was going on in the much quieter main programme.

Having done commendable work on the issue of consistent loudness, in my opinion, those committees now need to turn their attention to how dialogue is treated in audio post-production, adapting perhaps the established speech intelligibility index (SII) used elsewhere in audiology for television sound mixing; and in a further, important step, through this new conceived standard, empower audio post-production professionals to override irresponsible production requests for unsuitable balances between dialogue, effects and music. Sometimes it’s necessary to save people from themselves, however creative they like to appear, for the benefit of everyone else.

Mike concludes his excellent article with an understandably heavy heart, saying…

“As an audio post-production editor and mixer, I feel we have failed if normal-hearing consumers have to turn on subtitles to follow the narrative. It is beholden on us, and those with influence and control of the budgets and creative choices, to understand the issues [for] the consumers we serve.”

Gain through Train(ing)

Training filmmaking creatives to have a better awareness and understanding of sound would also massively help the situation.

Just like audio post-production sits at the very end of the moving picture production process, it's been my experience from teaching in several different universities that it's a lonely module that deals meaningfully with sound, and it generally appears right at the end of most film and television production courses.

It’s as if sound is taught as an afterthought rather than a creative equal, which in turn, means that students who know no better interpret this as being the correct order of importance for film and TV production. I would say that it's the edict that teaches ‘first comes the picture’ that has contributed so much to getting us in the mess we are today. But where would we be, I wonder, if the teaching of filmmaking instead said, “In the beginning was the word”?

Put up or shout up

And finally, in the spirit of ‘If you’re not part of the solution, you’re part of the problem’, after publishing my 2021 book on sound design for film and television (Hillman, N. (2021) Sound for Moving Pictures – The Four Sound Areas, Abingdon: Routledge) I founded the Sound for Moving Pictures Academy to help non-sound professionals such as directors and picture editors to better understand sound; and to show how strategies to incorporate ‘sound thinking’ into their way of working, could take advantage of the powerful creative force that a considered soundtrack can deliver to any production.

The Audio Suite has been creating and mixing soundtracks for film and television clients since 2002

We use DaVinci Resolve as our teaching platform, (we're proud to be a Certified Blackmagic Design Training Partner), and the free version of DaVinci Resolve allows us to welcome students on to our short courses who might otherwise find the cost of funding new software prohibitive.

It’s not just me by the way, there’s a close-knit family of sound and picture professionals also on hand at the Academy to answer any queries folk might have, in a supportive environment, where the only silly question is the one you chose not to ask... (Rest assured, egos are encouraged to be 'left outside'.)

Candidly, I’ll share my current ambition with you: and that’s for the Academy to reach 80% of the film and television, picture-centric, production population. I'd like it to be 100% of course; but apparently some chap, a long time ago in Italy, observed a natural order that sat at an 80:20 ratio...

See this content in the original post