Reality Check: Transcription vs. Speech Recognition Software

Transcription Vs Speech Recognition Software Audio Transcription Center Blog 
Here at ATC, we occasionally get the tough questions. One in particular that briefly stops us in our tracks: “Why can’t I just use speech recognition software?”

Nobody likes being replaced by a computer, or a robot, and we are no exception. Our short answer to that question is this: “we are more accurate and more versatile than the software available today.”

Still don’t believe us? Well, we’re going to introduce you to our competition.

Speech recognition has been around since 1952: that early device could recognize single spoken digits. (We, on the other hand, have been around since 1966, and were able to recognize whole spoken sentences immediately.)

The next large leap forward came in 1982: Dragon Software, who still release speech recognition software today, released software for industrial use. By 1985, that software had a vocabulary of 1,000 words – spoken one at a time. (That is comparable to a four-year-old child. We don’t recommend having a four-year-old, even a precocious one, transcribe your audio.)

Dragon itself even admits this today: “Most of us develop the ability to recognize speech when we’re very young. We’re already experts at speech recognition by the age of three or so.” Our college-educated transcriptionists had vocabularies in the 17,000-word (and up) range. Even in 1985. And they still do.

By 1993, a computer could recognize over 20,000 spoken words, which put it on a par with human beings. Except for the accuracy, which was only 10% in 1993. By 1995, the error rate had dropped to 50%, which is quite a leap in a short time. (Our transcriptionists test at 98% accuracy.)

In 1997, Dragon released “Naturally Speaking”, its first consumer speech-recognition product. By 1997, we already had a 31-year head start on transcription for consumers at large.

We know, we know…

“That was back then. How about now?”

We’re glad you asked. 

Since 1985, the National Institute of Standards and Technology have been benchmarking speech recognition software. The graph below illustrates some key data points highlighting several of their relevant benchmark tests.  (Click the graph to enlarge.)
 
(source: National Institute of Standards and Technology, http://www.itl.nist.gov/iad/mig/publications/ASRhistory/index.html)

There are a lot of data points up there, so let me highlight the important features:

    • Take a look at the error rates (WER means Word Error Rate) for Conversational Speech (in red) and Meeting Speech (in pink). They aren’t even close to what human beings can deliver.
    • That 2% to 4% range is human error. As in, the accuracy rate you would get from our human beings. And we aim for even lower than that.
    • The only tests that match up with human accuracy are air travel planning kiosk tests (bright green). Also known as “People Who Speak Very Deliberately and Slowly in Airports.”
    • Very few people speak deliberately and slowly in real life.
    • The error rate for broadcast news readers (blue), ie: people who are very well-paid to speak clearly, is around 10%.
Software has to be trained to recognize your voice. And re-trained to recognize anyone else’s. Our transcriptionists can handle a meeting full of speakers and accurately differentiate them.

A 98% accuracy rate means you will spend much less time reviewing your audio, correcting errors and inaccuracies, and much more time growing your business.

The bottom line is this: computers are getting smaller, and more powerful, all the time. They can do many things better than human beings can.

But not, as you can see, transcription. And looking at the graph, they won’t catch up anytime soon.

Your audio wasn’t recorded in a lab, it was recorded in the real world, where we live. We transcribe conversations and meetings every day, from all over the world. Not to mention webcasts, dictation, presentations, and conferences.

Again, Dragon says it themselves: “People can filter out noise fairly easily, which lets us talk to each other almost anywhere. We have conversations in busy train stations, across the dance floor, and in crowded restaurants. It would be very dull if we had to sit in a quiet room every time we wanted to talk to each other! Unlike people, computers need help separating speech sounds from other sounds.”

Our transcriptionists and production staff are highly educated, well-trained, and are constantly learning, whether that means going to graduate school, reading magazines, or watching the newest viral videos.

We like computers, and we think we can co-exist. So, by all means, speak your destination into your cell phone’s GPS, or say “tech support” to speak to technical support. Those are two versions of speech-recognition software that many of us use almost every day.

But if your audio is any more complicated than that, call us. We’re versatile, we’re accurate, and if you pour us enough coffee, we won’t crash.

We have run full tests on the entire Dragon experience, from opening the box all the way to the proof of the pudding, which is in the crust… er, the transcript. We will publish those results on or before February 17, so keep an eye on your inbox and this blog for the results!

Archiving – Thinking Beyond the Shoebox!

Archiving Thinking Beyond the Shoebox - ATC Blog

In our inimitable fashion here at ATC (www.audiotranscriptioncenter.com) we’re constantly reading through all those emails we’re receiving from different listservs about any number of things.  The latest one that caught our eyes was about how rapidly technology is changing, and it got us thinking on many levels.  WWOCD?  What Would Our Clients Do?  The article in the latest issue of ComputerWorld.com is written by Lamont Wood, “Fending off the digital dark ages: The archival storage issue.” So this is where transcription of those audio/video collections is key to the longevity of your archives. 

When was the last time you tried to play a 33 rpm record?  When did you find an old floppy disk with information that you couldn’t access?  How about that interview of Aunt Lucy and Uncle Joe in the shoebox that was recorded in 1972 on any sort of media that is now outdated?  Point being, anything you record today will be outdated in 5 years, 10 years, 20 years.  Do you have a plan?  Does your customer have a plan?  We don’t have a plan either, but hey, we got you thinking about it. 

As far as I know no company is currently transcribing on sheepskin, but most everyone who receives their transcripts is storing them digitally.  These digital transcripts are now searchable documents, and then they are usually printed and stored for archival purposes as needed. 

The question again is, how often is digital media changing? 

Plainly, your audio archives will someday be obsolete, and you’ll have to look at ways to convert these collections to a new functional usable format. (How many of you are already doing this every 5, 10, 15 years or so?)  These transcripts of the media content provide the essence of what researchers need!

What will you do to make sure this scenario doesn’t happen to you or your client?  Or will you be retired by that point, and leave the “legacy” to someone else?

New Name. New Website. TTC is now ATC.

TTC is now ATC - ATC Blog
The times they have a-changed…

And so have our clients’ needs…

And so have we…

The Tape Transcription Center is now…

here it comes…

.
..
….
…..
….
..
.

the AUDIO TRANSCRIPTION CENTER! 

For over 40 (32 of which have been in this lovely building across from Boston Common) years we’ve done business under the Tape Transcription banner, a name that aptly described what was once our basic service.  But the fact is, in 2010, with the world almost completely digitally revolutionized, we now work with audio (and even video!) formats no one had even dreamed of back when Sandy Poritzky set up shop in 1966.

Tapes?  What the hell happened to tapes?

MP3?  WAV?  DSS?  AVI?  As technology has evolved and grown exponentially over the last decade, so have audio formats, the way we receive and transcribe them, and how we return completed work back to our clients.

Just five years ago, roughly 80% of our work came in on analog tapes.  Flash forward to the present, and we have nearly 96% of the audio we transcribe coming to us in digital formats, most of it uploaded to us directly.  No postage fees, no waiting for delivery to start a job, no chance of loss or damage during shipment.  In turn, our already supernatural ability to meet or beat tight turnaround times has been upped by a full 40%.

But still, sometimes we almost kind of miss the holiday-like excitement of opening all of those big, clunky packages stuffed with reels or cassettes.  Almost. In fact, Sandy still buys lunch for the office to commemorate the anniversary of the day we received over 400 oral history cassettes in one delivery.

Audio instead of Tapes.  Got it.  

But how will I find you on the vast World Wide Web?

So glad you asked!  In conjunction with the change to a new name, we are excited to announce the launch of our snazzily revamped website at a brand new URL: http://www.audiotranscriptioncenter.com/
But what about all the good stuff TTC always offered?

Sure, our name has changed, but the way we approach our business hasn’t.
  • Our stock and trade is still the largest team of highly educated, culturally diverse and intellectually curious transcriptionists of any service, anywhere.
  • We still specialize in beating unreasonable deadlines.
  • We still never charge extra for rush jobs.
  • And of course, we still transcribe from tapes.

What hasn’t changed is probably best summed up by this email Patrick received from a client just last week:
 

“Thank you for being so great and accommodating.  We refer people to you all the time, particularly since you’ve been so good about dealing with our craziness.”

  • We still thrive on craziness.