Nobody likes being replaced by a computer, or a robot, and we are no exception. Our short answer to that question is this: “we are more accurate and more versatile than the software available today.”
Still don’t believe us? Well, we’re going to introduce you to our competition.
Speech recognition has been around since 1952: that early device could recognize single spoken digits. (We, on the other hand, have been around since 1966, and were able to recognize whole spoken sentences immediately.)
The next large leap forward came in 1982: Dragon Software, who still release speech recognition software today, released software for industrial use. By 1985, that software had a vocabulary of 1,000 words – spoken one at a time. (That is comparable to a four-year-old child. We don’t recommend having a four-year-old, even a precocious one, transcribe your audio.)
Dragon itself even admits this today: “Most of us develop the ability to recognize speech when we’re very young. We’re already experts at speech recognition by the age of three or so.” Our college-educated transcriptionists had vocabularies in the 17,000-word (and up) range. Even in 1985. And they still do.
By 1993, a computer could recognize over 20,000 spoken words, which put it on a par with human beings. Except for the accuracy, which was only 10% in 1993. By 1995, the error rate had dropped to 50%, which is quite a leap in a short time. (Our transcriptionists test at 98% accuracy.)
We know, we know…
“That was back then. How about now?”
We’re glad you asked.
There are a lot of data points up there, so let me highlight the important features:
- Take a look at the error rates (WER means Word Error Rate) for Conversational Speech (in red) and Meeting Speech (in pink). They aren’t even close to what human beings can deliver.
- That 2% to 4% range is human error. As in, the accuracy rate you would get from our human beings. And we aim for even lower than that.
- The only tests that match up with human accuracy are air travel planning kiosk tests (bright green). Also known as “People Who Speak Very Deliberately and Slowly in Airports.”
- Very few people speak deliberately and slowly in real life.
- The error rate for broadcast news readers (blue), ie: people who are very well-paid to speak clearly, is around 10%.
A 98% accuracy rate means you will spend much less time reviewing your audio, correcting errors and inaccuracies, and much more time growing your business.
The bottom line is this: computers are getting smaller, and more powerful, all the time. They can do many things better than human beings can.
But not, as you can see, transcription. And looking at the graph, they won’t catch up anytime soon.
Your audio wasn’t recorded in a lab, it was recorded in the real world, where we live. We transcribe conversations and meetings every day, from all over the world. Not to mention webcasts, dictation, presentations, and conferences.
Again, Dragon says it themselves: “People can filter out noise fairly easily, which lets us talk to each other almost anywhere. We have conversations in busy train stations, across the dance floor, and in crowded restaurants. It would be very dull if we had to sit in a quiet room every time we wanted to talk to each other! Unlike people, computers need help separating speech sounds from other sounds.”
We like computers, and we think we can co-exist. So, by all means, speak your destination into your cell phone’s GPS, or say “tech support” to speak to technical support. Those are two versions of speech-recognition software that many of us use almost every day.
But if your audio is any more complicated than that, call us. We’re versatile, we’re accurate, and if you pour us enough coffee, we won’t crash.
We have run full tests on the entire Dragon experience, from opening the box all the way to the proof of the pudding, which is in the crust… er, the transcript. We will publish those results on or before February 17, so keep an eye on your inbox and this blog for the results!