Introducing AI Speakers

A major advancement in speed & quality of AI speech

With our new Al voice model, training

only requires a few seconds of audio. Plus, our enhanced text-to-speech and Overdub generation make your Al speech sound more natural and convincing than ever.

Along with a fresh, user-friendly interface, the reimagined Al Speakers marks a new era of simplified speaker label management, an easier write mode experience, along with other exciting changes.

Dive into our comprehensive

transition guide

to swiftly master what's new and enhanced.

AI Speakers also marks the first release in a series of upcoming AI feature drops over the coming weeks. Stay tuned for more!

Terminology changes

Speakers—we’ve renamed the term Speaker labels to just Speakers. Now, Speakers represent the labels in a project, simplifying the identification of speakers and management of voices in your projects.
AI Speakers—When Speakers have speech generation enabled on them (adding a Voice clone), they graduate to an AI Speaker.
Text-to-speech—this term refers to the process of writing new text in your Composition with an AI Speaker selected.
Overdub—this term now only refers to the process of replacing pre-recorded speech audio using an AI Speaker.
AI Speakers tab—this tab in the Drive view was formerly known as the Voices tab.

Instant speaker creation

Eliminated the necessity for >10-minute training projects to create speakers, now it takes just around 30 seconds of audio

It no longer takes up to 24 hours for verification, it now takes under a minute

New user experience for adding new speakers in the AI Speakers view

New ways to create or use an AI Speaker from inside projects

Speaker label dropdown revamp

Speaker management is now fully integrated into the speaker label dropdown, eliminating the need for a separate modal
Functionalities include creating, selecting, and renaming directly from the speaker label dropdown

Overdub generation improvements

Overdub is now generated using the surrounding audio in the document to ensure that the new speech sounds exactly like the speaker in the recording
Enhanced verification to ensure the training statement voice matches the surrounding audio in the document for Overdub generation

Text-to-speech quality improvements

We’re now smarter about when to generate text-to-speech so it will generate more immediately, and only when you want it to

We don’t autogenerate while you’re typing in Write mode, so no more time pressure! If you make edits that aren’t covered by these triggers, we will still catch them but on a slower interval
Paragraph-by-paragraph generation replaces sentence-by-sentence generation for more natural speech flow within a paragraph

Other notable changes

Write mode
: We’ve simplified script modes down to just the single Write mode. And AI Speech is now generated primarily when exiting write mode. You can now toggle in and out of Write mode with Cmd-E.
Auto-generation of speech now occurs every 10 seconds instead of every 5 seconds in Edit mode, with no auto-generation in Write mode.
Speech generation triggers on playback, exiting Write mode, or if the AI Speaker changes