Clone A Voice With Just A 5 Second Sample - Could This Replace ADR?

November 14, 2019 Mike Thornton

This video has been posted by researchers who are now claiming that they can clone a person’s voice using just a 5-second clip. They explain that this can be done because they have trained a neural network, what we often call artificial intelligence or machine learning, on hours and hours of a wide variety of speakers so that it understands how we speak and then it can take a 5 second clip from an individual it has not heard before and clone a voice and get them to say things that were not in the clip.

The paper "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" and audio samples are available here:

Will This Reach ADR?

Imagine what this could mean for ADR? It could mean that you can analyse an actor’s voice and then use the AI to speak the lines. Then you could use Dialogue Match from iZotope to set it in context.

What Do You Think?

Does this scare you that anyone could be made to say anything, with all the ramifications for politicians etc, in fact anyone for whom ‘my word is my bond’ through to court cases etc. Would it be possible to embed some kind of watermark in the synthesised audio that can identify this as AI created? Or does this excite you with the possible solutions in content production?

See this gallery in the original post