Add phonetic pronunciations support for Overdub | Voters

Add phonetic pronunciations support for Overdub

Jim McKeeth

Generally it does a pretty good job, but sometimes it gets the wrong pronunciation of a word. Yesterday I was having a heck of a time getting it to pronounce "Live" as in "alive" (long I) instead of as "the place I live" (short I). Spelling phonetically isn't always enough. It needs to support International Phonetic Alphabet (IPA) - perhaps with a wizard to walk you through creating the right pronunciation.

September 21, 2020

Canny AI

Merged in a post:

Pronunciations TTS

Zontos Xristozo

Why don't you support the The International Phonetic Alphabet (IPA) symbols so that we could specify the actual vowels without all the troubles it takes now to try and get it to work.

June 12, 2024

Mathnasium Online

As described in a very similar feature request linked here (one that also proposes a convenient SSML tag-based solution) and here, enabling tuning and adjusting of Overdub’s output (for example, to read individual letters and numeric digits, insert pauses, adjust tone and emphasis, etc) is essential for our use-case creating training materials for staff/students teaching/learning mathematics.

That requester’s proposal of using SSML tags to implement such easy, user-configurable tweaking of Overdub’s output would be truly awesome and, presently lacking this capability, Descript is forcing its customers to seek workarounds and alternate solutions, and leaves a big opening for competitors to walk through.

Based on these requests continuing upvotes (110+ combined) and enthusiastic comments, after ~two years~ as a common feature request, it’s time to get these at least “Under Review”!

SEE RELATED REQUEST: https://feedback.descript.com/feature-requests/p/provide-additional-control-over-how-overdub-generates-speech-from-the-text-eg-us

Sharif

yes, hope so.

Pascal

WE NEED THIS implemented as described here ...please, everyone, up-vote it! Per the suggestion below from Frameworks, SSML sounds like a great approach to providing better control over how Overdub generates speech from text

so long as SSML tag visibility can be shown|hidden by way of a keyboard short-cut and/or menu option in Descript

Frameworks

Hi, Jim. Perhaps you should merge this request for "phonetic pronunciations" under your other similarly entitled "Control of Overdub pauses and emphasis", as it seems to have gotten more visibility and most users would want both capabilities ...no?

BTW:

Descript offers the beginnings of what you're requesting via its feature

"Overdub Styles".

Concerning both of your requests, I've up-voted them both and, as noted under your other request, I am recommending the use of Speech Synthesis Markup Language (SSML) tags to provide the control you're seeking over how Overdub generates speech from the text.

For example, as described here and here, you can specify International Phonetic Alphabet (IPA) pronunciations using the SSML

tag as in:

You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.

I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.

Similarly, to specify a delay of fixed duration (e.g., 2 seconds), you would simply insert the markup

at the desire pause point within the corresponding text.

Please see

(and up vote :)

my request entitled "Provide additional control over how Overdub generates speech from the text (e.g., using SSML tags)" here: https://feedback.descript.com/feature-requests/p/provide-additional-control-over-how-overdub-generates-speech-from-the-text-eg-us

SSML USE-CASES

Amazon's Polly: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
Google's Assistant: https://cloud.google.com/text-to-speech/docs/ssml
Siri, et al: https://www.smashingmagazine.com/2019/03/sanity-portabletext-speech-synthesis/

BTW

: Are you the "Jim McKeeth" of Embarcadero|Delphi fame?

Jim McKeeth

Frameworks: That is me. I voted up your suggestion too. Thanks!

Tau Lukos

I too was so frustrated trying to get an Overdub voice to pronounce LIVE correctly. I wanted it to sound like "Saturday Night Live" but nothing I did would get it from pronouncing it like "Live or Let Die." I understand it is confusing to have two pronunciations for the same spelling. But that is not uncommon in English. So having a way to let Descript know I want pronounciation 2 instead of pronounciation 1 would be helpful.

I have also noticed that Overdub voices have problems with single digit numbers.