AI vs Human Transcription: the Nitty-Gritty

In a recent post, we discussed what we call “Forensic Transcription”––a term we use not to indicate transcription work relating to crime investigation, but to refer to our specific method here at ATC of approaching each project we undertake with a meticulous, detail-oriented attitude. This approach has earned us our reputation as a top transcription service, with a focus on accuracy above all that we continue to stake our reputation on. We have never strayed from our guarantee of at least 99% accuracy or no charge, and we don’t ever intend to. 

But what really is “forensic transcription”, and why isn’t AI capable of it? After all, we don’t deny that AI technology has come a long way, even in just the past year. AI vs human transcription is a hot topic right now. AI transcription services abound––a simple Google search pulls up thousands of them, many of which boast low prices, incredibly fast turnaround, and even free trials. So what’s the missing link? Why hasn’t AI taken over the transcription market completely?

a robot hand and a human hand reach towards each other, not quite touching, meant to represent AI vs human transcription
Photo by Tara Winstead on Pexels.com

The answer is simple: accuracy. While the majority of the AI transcription services you might pull up in a search will boast on cost and speed, accuracy is not a term so often bandied about. AI has grown more accurate as it has continued to develop, particularly if you’re working with broadcast-quality audio, crystal-clear speech, and simple terms. 

But rarely are recordings so cut-and-dry. The moment you add in, say, accents, foreign language excerpts, false starts, overlapping dialogue, technical jargon, or lower quality audio––all things that we can confidently say after over 50 years of transcription are pretty commonplace––AI struggles. As the tech currently stands in the struggle of AI vs human transcription, it still takes human brainpower to work through the complexities and nuances of most audio, and this kind of meticulous accuracy becomes particularly important depending on the project being transcribed.

Where AI transcription may work for a funny YouTube video about adding Mentos to Pepsi, where a lower level of accuracy is acceptable and the main focus of the content is in the visuals, it does not work well for a serious oral history recording from decades ago pertaining to a culturally significant topic, where foreign language excerpts, accents, audio quality, and specific terminology will all cause AI to falter. Projects of an academic, historical, or culturally important nature require the sensitivity and care of humans––and it is this truth that has guided us in our “forensic” approach to transcription, and will continue to guide us through projects to come, no matter the challenges.