Fix the constant missed words during transcription
shipped
J
Joe
I love Descript's tools for editing transcriptions, it makes it easy to add punctuation, fix capitalization etc.
And the transcription works fine, EXCEPT for the fact that it misses a word (or two) every 30 seconds or so. This happens in every single video I do, and with similar frequency.
In fact it's so cumbersome to add the missing words, I've resorted to using Premiere Pro's transcription feature, then importing it into Descript using the 'replace transcript' function.
I've attached some screenshots from just a single video that is about 7 minutes long, and these aren't even all of them, just ones after the ~2.5 minute mark.
If I'm not mistaken, Descript uses Google Cloud's transcription function. I've used this in the past, and as I recall, it requires splitting longer sound clips into chunks of shorter length. I suspect the missed words are the result of poorly chosen split-points, but I could be wrong.
Canny AI
Merged in a post:
A better transcription software
D
Damien Benveniste
Descript is really unable to understand what I am saying and because of it I cannot utilize Descript to its full extent. Filler words or word gap only work 80% of the time. For any long video, it is unmanageable! Any chance you are going to use a transcription AI that actually works anytime soon?
Sunny Rochiramani
shipped
Michael
Adding to this: these screenshots are from the initial 2:30 of an audio track with a fully isolated speaker in a quiet room, no noise, no music. If I want to remove all silences in one pass, I have to hand-fix every incorrectly-sized word, usually about 2-3 per minute, during 20 to 50 minutes of audio.
Michael
And by the way, that's 2-3 mistakes a minute for just one person. I often work with multiple speakers on separate tracks. Multitrack crosstalk, even for truly isolated multitracks, is likewise abysmal, but that's for another request thread.
At any rate, this isn't new, as Descript transcription started setting incorrect word boundaries at sentence ends roughly 12-18 months ago, and it has never stopped since. I know programming is really hard! But "word tags covering the whole word" is a core piece of transcription functionality, and a dependency for accurate silence removal to boot.
Murray Robinson
Sunny Rochiramani Joe says that Descript misses a word or two every 30 seconds but its really much worse than that because not only does it miss quite a few words it also gets quite a few words wrong. Most of your editing features depend on high quality transcription so this is a serious problem for the whole product. For my podcasts the transcription quality is much better in Otter.Ai than it is in Descript. I see that Otter.AI license their technology for a low fee. It would be great if you use the superior Otter.Ai engine instead of the google one.
Sunny Rochiramani
Murray Robinson: We're just in the process of rolling out a new transcription engine that is more accurate than before. You can either try it out on our beta app or it should be available to everyone in 1-2 weeks. Would love to know if you continue to see similar issues.
Murray Robinson
Sunny Rochiramani:ok. trying the beta
Sunny Rochiramani
in development
Cristian Cotovan
If you still have issues, this is how I fix them: https://youtu.be/pFLfAM5LblU
Sunny Rochiramani
shipped
We shipped an updated transcription engine this morning which should've fixed this issue.
Cristian Cotovan
Sunny Rochiramani: Does this require an update? Or is it just a behind-the-scenes change?
Sunny Rochiramani
Cristian Cotovan: No client updates required. It was behind the scenes (on the server) change.
J
Joe
Sunny Rochiramani: I've tried with the latest version 35.1.2 (20220322.12), but unfortunately it still seems to be happening just as before. I've attached a few examples but it does still happen consistently every ~20-30 seconds. These examples are also a video I've not transcribed before so it shouldn't be a caching issue.
The transcription accuracy does otherwise seem to be quite improved with the newer version though. Which ends up just underscoring the annoyance because mostly the only time I have to stop and make an edit is when it missed a word completely.
Sunny Rochiramani
Joe: You're right; Unfortunately we ran into some issues with our new model but we're planning to slowly rolling it starting next week. I'll post an update here as we do. Thanks for your patience!
Cristian Cotovan
Sunny Rochiramani: So has the new transcription engine been rolled out now, or still not there? I know you rolled it back a while ago, which would have fixed some of these issues, but also make transcription far more accurate. Is that coming back soon? Because the transcription quality could be improved a lot.
Sunny Rochiramani
Cristian Cotovan: Not yet but internal tests are going well and we should roll it out in a week or two.
Cristian Cotovan
Sunny Rochiramani: It's been two weeks, how is this coming?
Sunny Rochiramani
Cristian Cotovan: We've rolled the new transcription engine to Beta and if things look good in the next 2-3 weeks, we'll roll it out to everyone. Let us know if you notice improvements using the Beta app.
Sunny Rochiramani
Hey folks, we've rolled the new transcription engine to everyone. Let us know if you still notice these issues.
J
Joe
Sunny Rochiramani: The transcription accuracy overall seems to have been greatly improved in the past months, but it does still seem to miss words. I think it might be not quite as frequent as before but still very apparent. I wonder if I talk faster than typical people or something so it can't find split points easily?
The image below is a possible clue from my latest video I'm working on. What I spoke was "across many different countries." But the transcription says "across many different C." (The period after the C is in the transcript).
Not sure if this suggests the split point is wrong and it cut off the rest of the word before transcribing, or rather somehow the transcription itself is being cut somewhere. I'm also wondering if instead of it cutting off the second half of the word, perhaps the split point is actually before "countries", but the beginning of the word's audio gets cut off, and the "C" comes from it transcribing the "countrEES" sound, if that makes sense. Just some random thoughts.
Regardless, I've also seen this type of thing happen on numerous occasions before.
Sunny Rochiramani
exploring
We've noticed this bug in our internal testing and looking into the issue.
J
Joe
Sunny Rochiramani: Awesome appreciate it 👍
M
Michelle Werts
Sunny Rochiramani: Will this fix also address the issue of the transcript not recognizing manual fixes? For example, when I currently try to fill in the missing word now, sometimes it accepts and recognizes the word when I correct the transcript, but sometimes it just rejects my attempted change. If this is a separate issue, I will create a new thread.
S
Shannon Wedge
Sunny Rochiramani: Is the bug missing the last word(s) in a sentence? Because that's where I'm seeing almost all the missed words in my transcripts.
Sunny Rochiramani
Shannon Wedge: Roughly yes. That's what we're trying to fix along with a brand new transcription engine which will also increase accuracy. We're internally testing it now and we expect it to roll out in a week or two.