AI vs Human Transcription: the Nitty-Gritty

In a recent post, we discussed what we call “Forensic Transcription”––a term we use not to indicate transcription work relating to crime investigation, but to refer to our specific method here at ATC of approaching each project we undertake with a meticulous, detail-oriented attitude. This approach has earned us our reputation as a top transcription service, with a focus on accuracy above all that we continue to stake our reputation on. We have never strayed from our guarantee of at least 99% accuracy or no charge, and we don’t ever intend to. 

But what really is “forensic transcription”, and why isn’t AI capable of it? After all, we don’t deny that AI technology has come a long way, even in just the past year. AI vs human transcription is a hot topic right now. AI transcription services abound––a simple Google search pulls up thousands of them, many of which boast low prices, incredibly fast turnaround, and even free trials. So what’s the missing link? Why hasn’t AI taken over the transcription market completely?

a robot hand and a human hand reach towards each other, not quite touching, meant to represent AI vs human transcription
Photo by Tara Winstead on Pexels.com

The answer is simple: accuracy. While the majority of the AI transcription services you might pull up in a search will boast on cost and speed, accuracy is not a term so often bandied about. AI has grown more accurate as it has continued to develop, particularly if you’re working with broadcast-quality audio, crystal-clear speech, and simple terms. 

But rarely are recordings so cut-and-dry. The moment you add in, say, accents, foreign language excerpts, false starts, overlapping dialogue, technical jargon, or lower quality audio––all things that we can confidently say after over 50 years of transcription are pretty commonplace––AI struggles. As the tech currently stands in the struggle of AI vs human transcription, it still takes human brainpower to work through the complexities and nuances of most audio, and this kind of meticulous accuracy becomes particularly important depending on the project being transcribed.

Where AI transcription may work for a funny YouTube video about adding Mentos to Pepsi, where a lower level of accuracy is acceptable and the main focus of the content is in the visuals, it does not work well for a serious oral history recording from decades ago pertaining to a culturally significant topic, where foreign language excerpts, accents, audio quality, and specific terminology will all cause AI to falter. Projects of an academic, historical, or culturally important nature require the sensitivity and care of humans––and it is this truth that has guided us in our “forensic” approach to transcription, and will continue to guide us through projects to come, no matter the challenges. 

Accents, Artificial Intelligence and Humans.

Accents. There are an estimated 30 accents that span the landscape of the United States. Tell me, if we as humans have a hard enough time parsing out the dropped “Rs” in words from a Bostonian (please note we’re a bunch of Bostonians here at ATC), how is Artificial Intelligence (AI) ready and able to do so? It isn’t!

There’s a reason we continue to be used as a human test-case against AI.

Is the adaptability of artificial intelligence’s deep learning modules able to discern all of these accents, colloquialisms, and dialects the same as the adaptability of a human team of transcriptionists? We think not. Who better to transcribe that Bostonian than fellow Bostonians? Who better to comprehend the words and colloquialisms from recordings of oral histories from folks in New Orleans (for instance) than people from New Orleans? We’ve been custom-matching client content to every human transcriptionist for 55 years, and we’ll keep doing so. We guarantee it!

Lastly, I know when I talk to the AI of my phone asking it one question or another, inevitably, it gets something wrong every time. And mind you, I’m somehow one of those Bostonians who no one ever believes is actually a Bostonian. Yet, it still has a hard time understanding me. Go figure.

At the Audio Transcription Center, nothing about our intelligence is artificial!

SUPERIOR TRANSCRIPTS REQUIRE MORE THAN ARTIFICIAL INTELLIGENCE!

Time and again we’ve tested, measured, and evaluated voice recognition software to determine their highest possible accuracy level. We have determined that with broadcast-quality audio of two well-spoken people, AI can presently reach 96% accuracy, at best. Translated that means approximately 10 errors per page versus our greater than 99% accuracy rate resulting in an average 2.5 errors per page.

So what does AI continue to have difficulty with?

  • less than broadcast-quality audio
  • multiple voices (interviews with 2 or more people, multi-person focus groups, etc.)
  • accents
  • ambient noise
  • special vocabulary
  • grammar
  • punctuation
  • spelling

Our production team custom-matches our clients’ subject matter to each transcriptionist’s particular strengths, knowledge, and interests. Our goal is to always make sure that what is said, is what is heard, is what is transcribed. Capturing each recording in a transcript that follows the client’s directions in intricate finite detail. 

We have the most selective hiring standards in our industry. Aside from a minimum typing speed of 80 wpm, we only select people who are well-educated, culturally diverse, intellectually curious, and possess excellent grammar skills.

We are able to quickly mobilize a dedicated team for your time-sensitive projects as well as highly confidential work, and we also have a nationwide team available for projects of any size and subject matter. 

AT THE AUDIO TRANSCRIPTION CENTER
NOTHING ABOUT OUR INTELLIGENCE IS ARTIFICIAL! 

The Hidden Truths of Voice Recognition Software

  • Q: Why the Audio Transcription Center cannot use Voice Recognition Software?
    • A: Because Voice Recognition Software is not yet capable of producing to our strict standards.
  • Q: What strict standards?
    • A: Let us count the ways:
      • VRS has difficulty in recognizing, simultaneously or not, two or more voices. Of course, two or more voices are intrinsic to oral histories.
      • VRS has difficulty with accents.
      • VRS has difficulty in dealing with less than broadcast quality sound.
      • VRS has difficulty with overlapping dialogue, idioms, collaquialisms, and especially ambient sound.
      • VRS – Formatting? Fuggedaboutit!
      • VRS developer IMB reached a 94.5% accuracy milestone which they are very proud of in its evaluation by “using the SWITCHBOARD corpus, a collection of telephone conversations that’s been used for decades.” “SWITCHBOARD is not the industry standard for measuring human purity, however, which makes breakthroughs harder to achieve.”
      • Finally, an important factor of VRS accuracy is the need for “training” the software to recognize the speech patterns and idiosyncrasies of the speakers. Imagine asking your narrators train the software that will be transcribing the session before each of your interviews. Oy!

BUT IT AIN’T ALL BAD

There are many projects where a very rough transcript is used as a quick reference source, and an actual verbatim transcript isn’t even required. In those cases, perfect transcripts are not needed, and VRS fits the bill… As well as lowers your initial budget.

In summary, if you don’t need a near-perfect transcript, VRS is a wonderful tool at a reduced cost. If you’re looking for an accurate transcript that is also 100% guaranteed, then the only option is to call your transcription vendor of choice. You might want to try us. Call us at (617) 423-2151, or click on the GET A QUOTE link in red.

 

7 Digital Recording Devices for Oral History Interviews

7 Digital Recording Devices for Oral History Interviews

In theory, research interviews could be recorded with any device — a phone, a laptop, or even a camcorder. But if you want to save big in the long run, it’s better to invest in a good digital voice recorder. These devices are specifically designed for recording long interviews at high quality, which makes the subsequent transcription process faster and more cost-effective.

Continue reading “7 Digital Recording Devices for Oral History Interviews”

5 Affordable Voice Recording Devices

5 Affordable Voice Recording Devices - ATC Blog

Whether you need transcripts for lectures, conversations, dictations for articles or books, or just your own personal thoughts, nothing is more convenient than a portable recording device. In today’s plugged-in world, make sure your recorder has a USB port or an external memory card slot so the audio file can be easily exported and shared with your transcription provider. With that in mind, here are 5 affordable voice recording devices available for purchase on Amazon.

Continue reading “5 Affordable Voice Recording Devices”

Computer transcription misleads even as it impresses

With speech-to-text transcription, what are you really saving?

[Patrick Emond contributed to this post]

Last week, IBM trumpeted  their latest achievement in automated speech-to-text: a record-low error rate of 5.5 percent. But always, especially with regard to saving money on transcription, you have to read the fine print.

Continue reading “Computer transcription misleads even as it impresses”

Analog vs. Digital: Pay Now or Pay More Later

Analog vs Digital Pay Now or Pay More Later - ATC Blog

Some of you have asked me why we still have information on our website about “going digital,” but clearly the fact that we still receive newly recorded audio on “old-fashioned” cassette tapes  tells me that some people just don’t understand the importance of upgrading technology (on a lot of levels).  After 44 years in business, we finally took the “tape” out of our name, because it’s all about the audio!

Today I’m writing about more than “going digital,” but I will also touch upon recording habits in general.  Remember, just because you’re recording digitally does NOT mean that you will automatically have broadcast quality audio.  (WHAT?! You’re thinking, ‘it’s digital, so it has to be better quality.’)  There’s a lot involved in recording, and as the person conducting the recording, you need to stop and think about the details of recording for more than a couple of seconds.  That’s right, we know that some of you already know these things, but do you truly take the time to learn your device before using it?  I know that’s a very personal question, so think about it for a moment.  You don’t have to share.

The quick points to remember:

First and foremost, it’s now 2011, so use a digital recorder!  You can walk into any electronics store, or jump online and find one.  Just do some research first.  Remember, in 2004, 90 % of our clients used analog equipment to record their interviews.  Now in 2011, 95% of our clients use digital equipment to record their interviews.  You’ll have immediate access to your audio recording.  Volume too low? There’s software for you to give the file a quick boost to increase the sound quality.  Is your transcriptionist next door or across the country?  It doesn’t matter where they are located, because you can upload your audio to them, and still have access to listen your audio.  Imagine never having to spend shipping dollars again!!

Clearly the facts demonstrate there’s been a near total reversal in the analog vs. digital battle.  Remember, your transcripts are only as good as the audio your transcriptionist receives, and better quality audio will save time and save those all important dollars in your budget.  Again though, just remember, it’s more than just “going digital”!

You’ve purchased that device, but you really don’t want to delve into the box with the paperwork and all sorts of wires that are tucked neatly inside.  Read the paperwork, and use the wires.  Of all the wires in the box, use an A/C power-supply – it might be 2011, but batteries die quickly, so plug in when you can.  For those times that you forgot it at home, bring plenty of backup batteries!!   Seriously, go buy stock in the major brands, because you will always want to have an ample supply of batteries quickly within reach!  You never know when you’ll have to record those unexpected longer interviews.  Think of it as practicing “safe recording”!

Now you’re sitting there ready to hit the record button, but stop and check recording volume regularly.  I can’t tell you how many interviews we get where the recording levels are so low you can barely hear the person, so don’t forget to check those recording levels beforehand.  If your recording device has meters, refer to them, but also be sure to listen to the audio levels with headphones at the start of the interview session.
Another important piece of equipment to use is an external microphone.  Different situations require different types of microphones, so you’ll need to do a little studying up on what your recording environment needs.  If you’re able, try more than one external microphone among the group, to be sure you have properly mic’d all of your speakers.  This is especially important for any group larger than 3 individuals, and be sure to place these microphones as close as possible to the people who are speaking.  Sitting at a long table with people at both ends of the table? Think about how the person at the end of the table will sound if there is only one microphone in the middle of the table.  Murphy’s law also says that person will be your most verbal in the group.  Conducting a one-on-one interview?   Drop into Radio Shack beforehand, and grab a lapel mic.  The difference in recording quality is remarkable, and you’ll thank yourself later (as will your transcriptionist).
Don’t forget about the longevity of your recording for your archives!  Your transcriptionists do not require large archival files for transcribing, they just require some good audio to hear those words clearly.  On that note, if you’re going to be storing these recordings for archival posterity, make sure you do your research on the latest technological advances in formats for saving your audio files.  .wav? b-.wav? .mp3? Spend the time, do your research, and know the facts on digital audio longevity.  (See our previous blog on thinking beyond the shoebox.)

For a more detailed read, look over our recording tips page, and check out some of the other service providers we recommend as well.

Always remember your ultimate goals when you’re recording.  If you’re going to have your audio transcribed, you want the best recording possible, so give your transcriptionists audio that they can transcribe both fast and accurately!  If you can believe it, we’re telling you to spend a little more up front, that will save you money on a service we provide.  Go figure…

Reality Check: Transcription Vs. Speech Recognition Software – The Showdown

Transcription vs Speech Recognition Software Audio Transcription Center Blog
If anyone reading is a fan of the game show Jeopardy!, you already know that this week, IBM super-computer Watson is taking on legendary past Jeopardy! champions (and human beings) Ken Jennings and Brad Rutter in a Human vs. Human vs. Machine grudge match, and we now know Machine has won!
Congratulations to Watson.
We don’t have a super-computer, or a fancy game-show soundstage, but we are bringing you the results of our Human vs. Machine faceoff. Can human transcriptionists from the Audio Transcription Center (ATC) slay the Dragon? Read on and find out!
(Full disclosure: we’re a transcription company that has been in business since 1966. Successful speech recognition software could put us out of business. Just so you know.)
Championships have been won in Boston: the Red Sox have won World Series, the Celtics NBA Championships, and the Bruins Stanley Cups, all just five minutes from our very offices. So it is fitting that our office be the site of this titanic Human vs. Machine bout!
First of all, I will introduce the Machine… wearing a green cardboard box, from Nuance Software, Dragon Naturally Speaking 10, Home Edition, or as we prefer to call it “Team Dragon”. (Version 11 has been released since we began testing; and we will put it to the test at a later date.)
And in the other corner, wearing headphones, torn jeans and flexing their fingers… the human transcriptionists of the Audio Transcription Center (ATC), specifically four randomly-selected competitors from our staff of dozens of versatile, multi-talented transcriptionists. All four, collectively known as “Team ATC”, were eager to take on the challenge.
“But wait,” you exclaim! “Dragon only works with one voice at a time, this is an unfair fight!” Correct. But rather than automatically claim victory, we decided to level the playing field by having both competitors work with only one voice, who would be speaking on a variety of subjects.
Dragon Naturally Speaking (or “Team Dragon”), as well as our team of terrific transcriptionists (or “Team ATC”), would be transcribing the voice of… me. Your humble blogger, formerly heard on college radio and occasionally behind a karaoke machine, would be the voice that would take both competitors to their limits!
Let’s begin the match, shall we?
First of all: speed of delivery
Team Dragon: walk to the store, purchase the software, come back to the office.
Team ATC: walk to the subway, purchase subway ticket, come to the office.
Advantage: We’ll call this one a tie.
Speed of installation
Team Dragon: 32 minutes for “complete installation”. The DVD-ROM was a very bright shade of orange.
Team ATC: less than 10 minutes for installation, and that includes pouring themselves a cup of coffee while the computer boots up. Occasionally wears bright colors as well.
Advantage: Team ATC.
Speed of training for first-time use
Team Dragon: 39 minutes, from first launch until the program was ready for prime-time, including entering the serial number at least 4 times.
Team ATC: About two hours, including filling out at least 4 pieces of paperwork. We’re thorough that way.
Advantage: Team Dragon.
So far, before we’ve introduced actual transcription into the contest, we’re tied at 1-1. It’s a close match in the early going…
Now, let’s bring in some actual audio. Specifically, about 1,135 words, spoken over about 7 minutes, on a variety of subjects, by yours truly.
“But wait,” you exclaim. Again. “’Team Dragon’ has to be trained to recognize your voice! It’s designed to improve as you use it more!” Correct. Whereas ‘Team ATC’, none of whom have ever heard my voice on a recording, can hit the ground running immediately. Advantage: Team ATC.
Back to the audio: our four transcriptionists each took one pass at it, transcribing it verbatim (with ums and ahs). Once done, the audio was given a real-time review, and time needed to perform corrections was noted.
Transcription time for “Team ATC” for seven minutes of audio, spoken in a quiet room, clearly and methodically: averaged out to 20 minutes.
But how did it look, you ask? There was an average of two errors in the 7 minute file. Out of 1,135 words, that’s over 99.8% accuracy before review. Review time averaged out to eight minutes, for a total score of 28 minutes.
Now, for the first round with “Team Dragon”. For the first round, I once again spoke slow-ly and meth-od-ic-al-ly. I also spoke punctuation and carriage returns in their appropriate places, as per instructions.
Dictation time for “Team Dragon”, first round? 16 minutes. Which sounds fast, until you realize that reading the audio into a recorder at ‘normal’ pace took less than half that time.
But how did it look, you ask? Not so good. Review time took 18 minutes; with over 60 errors (versus two!), for a total score of 34 minutes, and around 94% accuracy or roughly 15 errors per page. Which sounds good, until you remember that this is one voice, speaking slow-ly and meth-od-ic-al-ly. Which most of us don’t do in our daily lives.
 
Advantage for round one: “Team ATC”.
Before the competition, and in between rounds, while “Team ATC” was eating lunch or going for walks, “Team Dragon” was in training, as I read and corrected material from various sources into the software. Song lyrics, blurbs from dust jackets, chocolate bar wrappers… “Team Dragon” was being further trained to recognize my dulcet tones.
For round two with “Team Dragon”, I changed a setting to speed up the process; Dragon has a setting which inserts commas and periods in logical places. That indeed shaved a few minutes from the dictation time: dictation now took 11 minutes.
But how did it look, you ask?  Still not so good. There were over 40 errors; review time took 13 minutes (which was, again, longer than the dictation itself), so over 96% accuracy or roughly 10 errors per page. Which, again, sounds impressive, until you compare it to 99% accuracy.
Total time for round 2, including review time: 24 minutes. Which means…
Advantage for round two: “Team Dragon”.
So what have we learned? That speech recognition software can, with repeated training, be accurate enough that your dictation time, plus your review time, can be faster than a human transcriptionist.
So “Team Dragon” wins? The robots are taking over?
Uh, no.
If your audio input consists of one voice, and only one voice, and you have enough access to that one voice to allow Dragon to become further accustomed to that one voice, then by all means, stop reading now, and become a proud supporter of “Team Dragon”.
For everyone else, “Team ATC” is still miles ahead. “Team ATC” can transcribe your all-hands meeting, with its 27 participants from the CEO to the intern. “Team Dragon” can’t.
“Team ATC” can transcribe your interview with your Nana where she talks about the old country; and because the Audio Transcription Center (ATC) can match your interview subject matter up with the right member of “Team ATC”, you can get a transcript with 99% accuracy or higher, even though we’ve never heard your voice.
 
“Team Dragon” can transcribe you or your Nana, at lower than 99% accuracy, and only knows what it’s been programmed about the old country.
And most importantly, the human beings at the Audio Transcription Center (ATC) can consult with you before your project even begins, and work with you to help you get the most out of your limited transcription budget.
When and if “Team Dragon” catches up to us, and is able to transcribe the material our talented, smart human beings are able to transcribe, quickly and accurately, we will be the first to jump on the bandwagon. Until “Team Dragon” puts us out of business.
But for now, if you call the Audio Transcription Center (ATC), there are no machines to train, no dragons to slay, just friendly, helpful customer service, a second-to-none transcription staff and a 100% satisfaction guarantee.
Next in line for us is a white paper that will help you find your best transcription solution, even if it is (gasp) not us!
by Patrick Emond