Add phonetic pronunciations support for Overdub
J
Jim McKeeth
Generally it does a pretty good job, but sometimes it gets the wrong pronunciation of a word. Yesterday I was having a heck of a time getting it to pronounce "Live" as in "alive" (long I) instead of as "the place I live" (short I). Spelling phonetically isn't always enough. It needs to support International Phonetic Alphabet (IPA) - perhaps with a wizard to walk you through creating the right pronunciation.
Canny AI
Merged in a post:
Pronunciations TTS
Zontos Xristozo
Why don't you support the The International Phonetic Alphabet (IPA) symbols so that we could specify the actual vowels without all the troubles it takes now to try and get it to work.
Mathnasium Online
As described in a very similar feature request linked here (one that also proposes a convenient SSML tag-based solution) and here, enabling tuning and adjusting of Overdub’s output (for example, to read individual letters and numeric digits, insert pauses, adjust tone and emphasis, etc) is essential for our use-case creating training materials for staff/students teaching/learning mathematics.
That requester’s proposal of using SSML tags to implement such easy, user-configurable tweaking of Overdub’s output would be truly awesome and, presently lacking this capability, Descript is forcing its customers to seek workarounds and alternate solutions, and leaves a big opening for competitors to walk through.
Based on these requests continuing upvotes (110+ combined) and enthusiastic comments, after ~two years~ as a common feature request, it’s time to get these at least “Under Review”!
- SEE RELATED REQUEST: https://feedback.descript.com/feature-requests/p/provide-additional-control-over-how-overdub-generates-speech-from-the-text-eg-us
Sharif
yes, hope so.
P
Pascal
WE NEED THIS implemented as described here ...please, everyone, up-vote it! Per the suggestion below from Frameworks, SSML sounds like a great approach to providing better control over how Overdub generates speech from text
so long as SSML tag visibility can be shown|hidden by way of a keyboard short-cut and/or menu option in Descript
.Frameworks
Hi, Jim. Perhaps you should merge this request for "phonetic pronunciations" under your other similarly entitled "Control of Overdub pauses and emphasis", as it seems to have gotten more visibility and most users would want both capabilities ...no?
Concerning both of your requests, I've up-voted them both and, as noted under your other request, I am recommending the use of Speech Synthesis Markup Language (SSML) tags to provide the control you're seeking over how Overdub generates speech from the text.
For example, as described here and here, you can specify International Phonetic Alphabet (IPA) pronunciations using the SSML
<phoneme alphabet=...>
tag as in: You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>.
I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
Similarly, to specify a delay of fixed duration (e.g., 2 seconds), you would simply insert the markup
<break time="2s"/>
at the desire pause point within the corresponding text.Please see
(and up vote :)
my request entitled "Provide additional control over how Overdub generates speech from the text (e.g., using SSML tags)" here: https://feedback.descript.com/feature-requests/p/provide-additional-control-over-how-overdub-generates-speech-from-the-text-eg-usSSML USE-CASES
- Amazon's Polly: https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
- Google's Assistant: https://cloud.google.com/text-to-speech/docs/ssml
- Siri, et al: https://www.smashingmagazine.com/2019/03/sanity-portabletext-speech-synthesis/
BTW
: Are you the "Jim McKeeth" of Embarcadero|Delphi fame?J
Jim McKeeth
Frameworks: That is me. I voted up your suggestion too. Thanks!
Tau Lukos
I too was so frustrated trying to get an Overdub voice to pronounce LIVE correctly. I wanted it to sound like "Saturday Night Live" but nothing I did would get it from pronouncing it like "Live or Let Die." I understand it is confusing to have two pronunciations for the same spelling. But that is not uncommon in English. So having a way to let Descript know I want pronounciation 2 instead of pronounciation 1 would be helpful.
I have also noticed that Overdub voices have problems with single digit numbers.