Speech Recognition

Speech Recognition: a reader asks…

I’d like to explore how I can write on my PC. I’m recently retired and over my career there was never a need to learn to type, so I didn’t. I always had a secretary who took shorthand and not only transcribed what I dictated, but also cleaned up what I said so it looked good on paper. Has computer technology advanced to that point yet? I’m considering writing a book, so I looked into taking touch-typing courses, but am wondering if I really need to. Any advice?

Speech recognition has come a long way over the last 20 years or so. I remember using Dragon Naturally Speaking software back in the early 90’s. I think it got me about 50-60% there. It was ok at recognizing common words but terrible at specialized words, and haphazard at best with punctuation. Fast-forward to 2022, and basic speech recognition is built into every smartphone and most computers as well. And it works much better!

I’m actually writing this article using Windows 10’s built-in Speech Recognition applet now. I would say that it’s about 80-90% accurate. I still have to go back in and correct a few things here and there, but it’s definitely faster than typing. Not that it didn’t take me a fair amount of time to develop this skill, and I think you should also consider that dictation to a machine is going to be different than dictating to a secretary.

I think for most people, the trick to making it work well is to change the way you speak, when speaking to a computer.

First, you need to think about what you’re going to say before you say it. This is really the only way that you can speak without the inordinate pauses, “uhh’s”, “umm’s” and other vocal devices we use all the time while we speak. This may take at least a little practice. Also, thinking about what you’re saying reduces our common ‘mis-speaking’ where we don’t say something the right way. Professional speakers spend a lot of time working on this skill so that they can give both prepared speeches and extemporaneous comments with a decent economy of words, and without causing the audience to drift. You want the former, but the latter isn’t important – your computer doesn’t care and doesn’t lose attention.

My method is to think out what I’m going to say before I say it, a sentence or two at a time. Then I say the sentence or two and pause. I think about the next sentence or two, and then say that, again pausing afterwards. Rinse and repeat till you finish, and you can start and stop anytime you want. It helps if you have a rough outline of what you’re going to say, either in your head or already written somewhere. For book-writing, there are a lot of good methods to plan out your book, although the standard outline works just fine.

I put the speaking skill first in this article because I think it’s really the key to making speech recognition work well enough for you to replace typing, regardless of what technology you use. Since you spent your career dictating, I think you have a leg up on most folks getting into speech recognition, so let’s move onto the technology.

Your Windows 10 PC has basic speech recognition built right into the operating system, no additional software needed. You’ll first need to make sure you have the right hardware, meaning a good microphone. You can always get a great prosumer microphone like the Blue Yeti line, or do fine with a simple and cheap lavalier microphone (like this one from amazon.com). There are tons of other choices, and if you’re using a laptop you can certainly use its built-in-microphone, but an external one will be better in most cases. If you’re using a laptop with a webcam perched on top of the monitor, that will work also.

You want to position the microphone as close to your mouth as practicable. In the case of a standard microphone, having it 6-10″ from your mouth is best. For the lavalier, clip it to the front of your shirt close to your neck. What a good microphone will do is pickup your voice clearly, and reject any ambient noise further away. This is one important factor to maximizing the accuracy of speech recognition.

Advertisement

Next, you’ll run the Speech Recognition applet, which is located in your Start menu under Windows Ease of Access. Or you can just type “Speech Recognition” into the search bar to the right of the Start button (bottom-left of your screen), and click on the applet that appears in the search results.

Please note that there is a substantial one-time setup routine you go through the first time you run the Speech Recognition applet:

Here are the nine screens you’ll go through this first time:

  1. This first screen is basic. Read it and click Next.
  2. Since you’ve already connected your microphone to your PC, the 2nd screen likely already has it selected. Click Next.
  3. The 3rd screen introduces your next task. Click Next.
  4. On this 4th screen, you will read out loud a number of sentences. As you read a sentence, the sentence will change once the applet has recognized what you said. Pause after you’ve read the sentence, and try to speak clearly but naturally. There are quite a few sentences in this training module. If the sentence doesn’t change after you read it, that means the applet didn’t recognize what you said as matching what was displayed. Repeat the sentence again. When you’ve read all the sentences, click Next.
  5. The 5th screen gives you a choice, and I recommend you sacrifice a little privacy for accuracy by selecting the “Enable document review” option, then click Next.
  6. The 6th screen has a link you can click to open a reference document for giving non-dictation commands to your computer. Click Next.
  7. The 7th screen lets you run the Speech Recognition applet when you start the computer. I leave this unchecked as I don’t use speech recognition to control my computer, and only run the applet when I want to dictate.
  8. The 8th screen lets you either run or skip a tutorial. If you have time, I recommend you go through the tutorial.
  9. The 9th screen shows the tutorial running. It’s a good 30 minute exercise.

When you’ve finished the tutorial and closed the window, you should see the Speech Recognition applet at the top of your screen. Anytime you want to run through the tutorial again, you can get to it through the Windows control panel.

While that is a fairly long list of steps to go through, it shouldn’t take you more than 45 minutes to go through – mostly to do the 30-minute tutorial. Completing this process will help improve the accuracy of your dictated results.

Ok, if you’ve gotten this far, you’re ready to start dictating! Please note that Speech Recognition will work in most any app that you use on your PC, such as Microsoft Word, an email in Outlook, and most any other app on your computer with one huge exception: If you try to dictate into a web browser (such as trying to dictate a Gmail message), you may find it not working and have to adjust a setting to enable this. Before you make this change, consider the privacy implications of enabling this:

  1. This is “online” speech recognition, versus device-based. This means everything you say when it’s active will be sent to Microsoft servers to be transcribed.
  2. Microsoft says it will not store or otherwise use your recorded voice, but I’m a bit hesitant about this, from a privacy perspective.
  3. The Speech Recognition applet doesn’t use cloud-based speech recognition, which is why it won’t work on a website.

If you still want to enable this capability, click Start > Settings > Privacy. Click the Speech menu item on the left-side. Slide the slider for online speech recognition on.

Here’s a workable alternative: I open either a Word or a Notepad text document and do my dictating into that, then copy and paste that into the web browser. It’s an extra step, but lets me keep that Online Speech Recognition switch turned off.

For my Mac readers, you can enable on-device speech recognition by opening System Preferences > Accessibility, and checking the “Enable Voice Control”. That will put a small icon on your desktop which you can move around to wherever you like.

This icon will be labeled “Wake Up” when it’s not listening, or “Sleep” when it is listening.

Note that Mac’s speech recognition works even in web browsers. Also, I should say that the speech recognition isn’t nearly as good as what I’ve experienced on Windows. It’s fine for common words and punctuation, but falls short for just about everything else.

And for smartphone users, you may be already familiar with both the dictation button at the lower-right of the on-screen keyboard, and with using Siri. Speech recognition works pretty similarly on all these devices, so my first section about speaking clearly and concisely is germane across all platforms.

I look forward to reading your book!


This website runs on a patronage model. If you find my answers of value, please consider supporting me by sending any dollar amount via Click or tap to open a new browser tab or your Venmo app and send money via Venmo to @positek (send to @PosiTek), Click or tap to open a new browser tab or your Paypal app to send money via your Paypal account to support@positek.net (send to Support@PosiTek.net), Click or tap to open a new browser tab or your Paypal app to send money using your credit card to support@positek.net (no Paypal account required) using any credit card (no Paypal account required), using Zelle, Apple Pay or Google Pay, or by mailing a check or cash to PosiTek.net LLC 1934 Old Gallows Road, Suite 350, Tysons Corner VA 22182. I am not a non-profit, but your support helps me to continue delivering advice and consumer technology support to the public. Thank you!

Go to Top of Page

One Comment

  1. There is an extension you can add to the Google Chrome web browser to enable speech to text from within Chrome. It’s at https://chrome.google.com/webstore/detail/speech-to-text-voice-reco/kcgloaobfaiejoiahlhnfaolfcifjjho?. I have not reviewed this or similar add-ins for Chrome or other browsers, this is a use-at-your-own-risk option.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.