Shorten silences in a Composition | Voters

Shorten silences in a Composition

shipped

Khaki

Equivalent to 'smart speed' in podcast players, offer a non-destructive process to shorten detected silences.
Speakers sound more confident, dialogue flows more smoothly for inexperienced speakers.
Analysing the waveform to determine the difference in volume between 'speech' and 'silence', you'd be able to offer users the ability to shorten any silence of at least, say, 300ms down to 50ms. 
Adobe Audition offers this destructively with Detect Silence; Ferrite for iOS non-destructively with Tighten.

July 24, 2019

Alex Newton

I am still simply struggling with the fact that Descript is just cutting out too much simply due to inaccurate transcript. The workaround of first cleaning up the transcript is a no-go.... (who needs a perfrect transcript?) - I suppose most users simply want to clean up the audio. Therefore, it would do no harm to word gaps/silence (whatever you call it) be identified by AUDIO, and NOT BY TRANSCRIPT.

This function would save hundreds of work hours per month if it worked based on audio. Right now, due to natural in accuracies of the transcript, at least 10-15% of the suggested word gaps delete too much (e.g. plural s-sounds and alike.). It looks like a simple fix too me. Would be so great if you changed that. But right now, with too much being cut, the 10-15% of "overdeleted" wordgaps worsen the overall result to such a degree that the function is almost unusable (unless you manually control every suggestion.)

Andrew Mason

marked this post as

shipped

Now available! https://blog.descript.com/descript-37-new-features-remove-silence/

Clay Blackiston

Andrew Mason: I love this feature but am experiencing one gap in its current form and was curious what you think: Descript considers any audio segment without an associated transcription to be "silence". 
So in this example I screenshot, you can see the word "yeah" was not transcribed by the AI, so the word gap search considers it a silence and would remove it if i were to hit "Apply" or "Apply All".
I imagine there are plenty of cases where this is actually the desired behavior, like when there's background noise, coughing, etc. that isn't actual words and you'd prefer to be removed.
One workaround is to edit the transcription to make it perfect before doing this type of batch function, but that might ultimately take more time than the word gap feature would save... 
I'm wondering if it might make sense to have two options for the word gap search: one that searches for gaps between known/transcribed words (current behavior) and another that searches only for true silent (e.g. 0db) gaps.
With decent microphones I reckon the latter option on the feature would capture almost all the instances of silences, since there shouldn't be much background noise at all.

Paul Greenberg

I use descript mostly for video editing. I need a way to cut those long pauses between a question and answer. Ideally it would cut down anything longer than 2 seconds.

C.K. Lin

or just speed up the speaking pattern. 115% =)

Andrew Mason

Question for all of you waiting on this feature: We're thinking of making it more of a "shorten word gaps" - so it would look for gaps in the transcription that are longer than a certain length, and shrink them down to whatever you define.
It's a little different than remove silence... both have situations where they'd return bizarre results, but the hope is that this'll be better because you won't get weirdness caused by noise jumps, etc.
Anyway, this is your opportunity to tell we're missing something before it's too late!

J Metz

Andrew Mason: I think that makes sense. From time to time speakers look for materials to show (slides, etc.) and there's quite a bit of waiting time of 'gaps' between the time they're talking and when they resume. If I understand you correctly, this is kind of what you mean. That if there's a 10 second gap between spoken words, then 8 seconds (or so) will be removed so that there's a 1 second buffer on either side of the gap that remains?

Rick Mohr

Andrew Mason: Pauses in a natural conversation have different lengths, and reducing them all to the same amount isn't always what you'd want. Consider shortening to a given percentage of the pause length, with a minimum. So for example one could say "shorten silences to 40% of their length, but no shorter than .5 seconds." If somebody wanted to shorten them all to the same amount they could specify for example 100% and .5 seconds.

Mark Bramhill

Andrew Mason: This sounds great to me! My one note is that, rather than "all gaps longer than 2 seconds are shortened to 1 second," have it be something like "all gaps longer than 2 seconds are shortened by some percentage/formula." Not sure what Overcast & other apps do to calculate "shorten silences" but pauses still have relative lengths. I think if you do an absolute length shortening it'll wind up sounding a bit mechanical. [Edit: I see Rick Mohr posted the same idea while I was typing this up!]

Andrew Mason

Mark Bramhill: does something like that exist in any other app you've seen?

Nick Robinson

Andrew Mason: This seems like a smart approach! Like you say, I'm sure both "remove silence" and "shorten word gaps" would have imperfect results - but personally, I think my ideal version of this feature would be more on the conservative side with its edits and only edit the "obvious" gaps, if that makes sense. 
I think it's okay if the feature shows a little restraint and isn't super aggressive, at least at first. It seems to me like the biggest risk in automating this feature would be in false positives resulting in 'bad' / awkward-sounding cuts! 
In other words: given the choice between spending time manually shortening the gaps that the tool skipped over, or spending time undoing bad, overly-aggressives edits the tool made - I'd definitely prefer the former!

Charles Huang

Andrew Mason: Sounds awesome, perhaps include both fixed length and percentage-wise reduction? For my purposes, fixed length is more useful.

Khaki

Andrew Mason: (butting in here) yes; the Shorten Silences functions in Adobe Audition and Ferrite for iOS work like this. Optionally setting a 'silence threshhold', you select the minimum pause-duration to shorten, and the target pause duration in milliseconds.

Khaki

Andrew Mason: My experience with silence-shortening in other applications (Adobe Audition, Ferrite for iOS and, on the player side of things, Overcast) is good enough that I'm skeptical about a different approach yielding more natural results. In particular, I'd be concerned about weird edits in noisy word gaps (paper rustling, breath and laughter)

Jeremy Au

Andrew Mason: I'm open to setting a certain sound threshold and letting it run (I'm open to setting the length of the optimal pause), and then manually unwinding some of the changes (Similar to how Remove All Filler Words exist). The other way to do it is do it like a Word "Replace" function, where you can Replace All or Replace Next/ Ignore Next so you can do them all or review them one by one. Either way, it would be much faster than trying to eyeball which ones exist.

Mark Bramhill

Khaki Andrew Mason these apps where the ones I was thinking of — I don't know what algorithm they use, but a long pause remains longer than a short pause. I think it's non-linear and that a longer pause is a smaller percentage of its original length, but I don't know for certain.

Andrew Mason

marked this post as

in development

Ruben Martinez

Andrew Mason: YES!!!!!!!!! THANK YOU!!!!

Kim Døfler

Was thinking a “remove silence” function could also be really valuable for the video editing feature

hnivi

Great idea, love to have silences eliminated. Currently, I have to export the audio file to Audacity, fix the pause and silence there and import it back. It is very clumsy. Please consider this feature soon. Thanks

Joshua Copperman

love this idea! someone else mentioned um-detection and if you can have both you have a fan for life

Jared Mecham

This would be the greatest thing ever!!! 90% of my editing is removing filler words and reducing dead space and pauses. Would love this feature so much!

→