If anyone reading is a fan of the game show Jeopardy!, you already know that this week, IBM super-computer Watson is taking on legendary past Jeopardy! champions (and human beings) Ken Jennings and Brad Rutter in a Human vs. Human vs. Machine grudge match, and we now know Machine has won!
Congratulations to Watson.
We don’t have a super-computer, or a fancy game-show soundstage, but we are bringing you the results of our Human vs. Machine faceoff. Can human transcriptionists from the Audio Transcription Center (ATC) slay the Dragon? Read on and find out!
(Full disclosure: we’re a transcription company that has been in business since 1966. Successful speech recognition software could put us out of business. Just so you know.)
Championships have been won in Boston: the Red Sox have won World Series, the Celtics NBA Championships, and the Bruins Stanley Cups, all just five minutes from our very offices. So it is fitting that our office be the site of this titanic Human vs. Machine bout!
First of all, I will introduce the Machine… wearing a green cardboard box, from Nuance Software, Dragon Naturally Speaking 10, Home Edition, or as we prefer to call it “Team Dragon”. (Version 11 has been released since we began testing; and we will put it to the test at a later date.)
And in the other corner, wearing headphones, torn jeans and flexing their fingers… the human transcriptionists of the Audio Transcription Center (ATC), specifically four randomly-selected competitors from our staff of dozens of versatile, multi-talented transcriptionists. All four, collectively known as “Team ATC”, were eager to take on the challenge.
“But wait,” you exclaim! “Dragon only works with one voice at a time, this is an unfair fight!” Correct. But rather than automatically claim victory, we decided to level the playing field by having both competitors work with only one voice, who would be speaking on a variety of subjects.
Dragon Naturally Speaking (or “Team Dragon”), as well as our team of terrific transcriptionists (or “Team ATC”), would be transcribing the voice of… me. Your humble blogger, formerly heard on college radio and occasionally behind a karaoke machine, would be the voice that would take both competitors to their limits!
Let’s begin the match, shall we?
First of all: speed of delivery
Team Dragon: walk to the store, purchase the software, come back to the office.
Team ATC: walk to the subway, purchase subway ticket, come to the office.
Advantage: We’ll call this one a tie.
Speed of installation
Team Dragon: 32 minutes for “complete installation”. The DVD-ROM was a very bright shade of orange.
Team ATC: less than 10 minutes for installation, and that includes pouring themselves a cup of coffee while the computer boots up. Occasionally wears bright colors as well.
Advantage: Team ATC.
Speed of training for first-time use
Team Dragon: 39 minutes, from first launch until the program was ready for prime-time, including entering the serial number at least 4 times.
Team ATC: About two hours, including filling out at least 4 pieces of paperwork. We’re thorough that way.
Advantage: Team Dragon.
So far, before we’ve introduced actual transcription into the contest, we’re tied at 1-1. It’s a close match in the early going…
Now, let’s bring in some actual audio. Specifically, about 1,135 words, spoken over about 7 minutes, on a variety of subjects, by yours truly.
“But wait,” you exclaim. Again. “’Team Dragon’ has to be trained to recognize your voice! It’s designed to improve as you use it more!” Correct. Whereas ‘Team ATC’, none of whom have ever heard my voice on a recording, can hit the ground running immediately. Advantage: Team ATC.
Back to the audio: our four transcriptionists each took one pass at it, transcribing it verbatim (with ums and ahs). Once done, the audio was given a real-time review, and time needed to perform corrections was noted.
Transcription time for “Team ATC” for seven minutes of audio, spoken in a quiet room, clearly and methodically: averaged out to 20 minutes.
But how did it look, you ask? There was an average of two errors in the 7 minute file. Out of 1,135 words, that’s over 99.8% accuracy before review. Review time averaged out to eight minutes, for a total score of 28 minutes.
Now, for the first round with “Team Dragon”. For the first round, I once again spoke slow-ly and meth-od-ic-al-ly. I also spoke punctuation and carriage returns in their appropriate places, as per instructions.
Dictation time for “Team Dragon”, first round? 16 minutes. Which sounds fast, until you realize that reading the audio into a recorder at ‘normal’ pace took less than half that time.
But how did it look, you ask? Not so good. Review time took 18 minutes; with over 60 errors (versus two!), for a total score of 34 minutes, and around 94% accuracy or roughly 15 errors per page. Which sounds good, until you remember that this is one voice, speaking slow-ly and meth-od-ic-al-ly. Which most of us don’t do in our daily lives.
Advantage for round one: “Team ATC”.
Before the competition, and in between rounds, while “Team ATC” was eating lunch or going for walks, “Team Dragon” was in training, as I read and corrected material from various sources into the software. Song lyrics, blurbs from dust jackets, chocolate bar wrappers… “Team Dragon” was being further trained to recognize my dulcet tones.
For round two with “Team Dragon”, I changed a setting to speed up the process; Dragon has a setting which inserts commas and periods in logical places. That indeed shaved a few minutes from the dictation time: dictation now took 11 minutes.
But how did it look, you ask? Still not so good. There were over 40 errors; review time took 13 minutes (which was, again, longer than the dictation itself), so over 96% accuracy or roughly 10 errors per page. Which, again, sounds impressive, until you compare it to 99% accuracy.
Total time for round 2, including review time: 24 minutes. Which means…
Advantage for round two: “Team Dragon”.
So what have we learned? That speech recognition software can, with repeated training, be accurate enough that your dictation time, plus your review time, can be faster than a human transcriptionist.
So “Team Dragon” wins? The robots are taking over?
If your audio input consists of one voice, and only one voice, and you have enough access to that one voice to allow Dragon to become further accustomed to that one voice, then by all means, stop reading now, and become a proud supporter of “Team Dragon”.
For everyone else, “Team ATC” is still miles ahead. “Team ATC” can transcribe your all-hands meeting, with its 27 participants from the CEO to the intern. “Team Dragon” can’t.
“Team ATC” can transcribe your interview with your Nana where she talks about the old country; and because the Audio Transcription Center (ATC) can match your interview subject matter up with the right member of “Team ATC”, you can get a transcript with 99% accuracy or higher, even though we’ve never heard your voice.
“Team Dragon” can transcribe you or your Nana, at lower than 99% accuracy, and only knows what it’s been programmed about the old country.
And most importantly, the human beings at the Audio Transcription Center (ATC) can consult with you before your project even begins, and work with you to help you get the most out of your limited transcription budget.
When and if “Team Dragon” catches up to us, and is able to transcribe the material our talented, smart human beings are able to transcribe, quickly and accurately, we will be the first to jump on the bandwagon. Until “Team Dragon” puts us out of business.
But for now, if you call the Audio Transcription Center (ATC), there are no machines to train, no dragons to slay, just friendly, helpful customer service, a second-to-none transcription staff and a 100% satisfaction guarantee.
Next in line for us is a white paper that will help you find your best transcription solution, even if it is (gasp) not us!
by Patrick Emond
One thought on “Reality Check: Transcription Vs. Speech Recognition Software – The Showdown”
Well said! I enjoyed reading about your competition…and as a seasoned transcriptionist myself, who is circling the drain due to voice recognition, I hope you're right — I hope there's still a place for us in the future. 🙂