Beginner's guide to voice control

An increasing number of devices have voice control built in, and third party apps can also be obtained. We're going to talk about on built-in features - what are they, and why they're useful.

What is voice control?

Voice control, also called speech recognition, is where you work a device by speaking to it rather than using a keyboard, mouse or touch gestures. It can be broken down into three areas:

  • Dictation - what you speak is turned into text, for example, a letter.
  • Device control - you issue commands to open apps, choose menu or ribbon options, activate links and so on.
  • Searching - what you say is turned into an Internet search.

To be useful, a voice control system must be able to recognise a wide variety of accents, voice types and vocabularies, in a noisy environment. Some voice control systems require an Internet connection and the information you speak is sent to another computer where it is processed. You should therefore be wary of using it with confidential information.

Why use voice control?

On computers, device control is aimed primarily at people who have difficulty using the mouse or keyboard. In other words, it's an assistive technology. Dictation started out in the same way but is now marketed more as a productivity tool that could be useful to all users, because most people can speak faster than they can type.

Touch screen devices are good for lots of things, but their small flat screens mean typing isn't one of them. This has led to a general interest in alternatives to typing, which is heightened among blind and partially sighted users because of the increased difficulty of using the on screen keyboard with magnification or speech output.

Carrying out tasks on a touch screen device also requires one or both hands, and there are times when this is impractical - for instance if it's cold and you're wearing gloves that you don't want to remove. So again, using voice control for carrying out tasks and Internet searching is useful for lots of people.

One the other hand, mobile devices are often used in public places, where you may not want to be overheard speaking to your device!

How do I start voice control?

The ability to do dictation and device control has been built into computer operating systems for some years, with Internet searching being treated as a combination of the two.

  • In Windows, search for "Windows speech recognition"
  • In OS X, go to System Preferences and find "Dictation & Speech".

Smartphones and tablet computers can do Internet searching by voice as well as dictation and device control.

  • In iOS and Android, in any situation in which the on-screen keyboard is on screen, the key to the left of the spacebar is the dictate key, with an icon of a microphone. On Blackberry devices, the microphone key may be either side of the spacebar; press and hold it to start dictation. On Windows phones, the microphone key is below the right hand end of the spacebar.
  • In iOS, press and hold the Home button to start Siri, which is used for device control and Internet searching.
  • In Android, device control and Internet searching uses the Voice Search app, found at the top right of the home screen or in the Apps folder.
  • BlackBerry devices have an app called Voice Control, found on the home screen. It can also be started by holding down the Mute button.
  • Windows phones' speech recognition, once turned on, is started by holding down the Start button.

How does voice control work?

Essentially, voice control works by recognising what you say, and taking some action depending on the context.

Mobile devices have microphones built into them, which may be good enough for voice control. For computers you will need a headset, preferably with a noise-cancelling microphone. You can pick one up for as little as £15, or pay as much as £150.

You should speak in phrases to allow the system to more correctly identify what you are saying, for instance, to distinguish between "there" and "their".

When dictating text, you must vocalise punctuation and also remember to indicate "new line" or "new paragraph" and so on. Automatic capitalisation will depend on the settings of the app you are dictating into, but to be on the safe side you can say things like "cap this is a test" to get "This is a text", or "all caps rnib" to get "RNIB".

Increasingly you can use natural language rather than remembering specific ways to get the system to understand you. For instance, you may find you can say "do I need an umbrella?" instead of "what is the weather forecast for today?"

For best results, talk slowly and clearly directly into the microphone in a quiet area with a good Internet connection. Using earphones for the output may also improve results by keeping spoken output separate from the voice commands you give.

Most systems will have help available, which you can get by saying "what can I say?".

Different people may get different results from the same app, because it can be affected by the amount of surrounding noise, the way you talk and even the way you hold your device.

Windows computers

The first time you start voice control, you may be required to "train" it to recognise your voice. Training requires you to read out long passages of text that the computer then analyses.

On Windows XP and older computers, training was difficult or impossible for users of screen magnification or screen reader software. The passages of text were too long to memorise and assistive technology gave no indication whether you had to repeat a passage or go on to the next.

Windows 7 and 8 have improved the training process. The passages of text are much shorter, and there are audible and visual indicators when one has been recognised.

Once you have finished reading the training text, the computer processes it and builds a profile of your voice using that microphone. If you change your microphone, you will need to redo the training. If another person uses the same computer, it would be sensible for them to create their own profile.

You should speak in complete phrases, with pauses to allow the last phrase to be recognised before speaking the next.

Mac OS X computers

Voice control on a Mac is divided into dictation and speakable items.

Dictation is available wherever you can type.

  • Tap the Fn key twice to start dictation, speak your text, then tap Fn twice again to start the recognition phase.
  • You have to be online to use dictation, because recognition involves sending your text to Apple for processing. You may also not want to use it with sensitive information.
  • You can only dictate about 30 seconds worth of text in one go.
  • In OS X 10.9 (Mountain Lion), the "Use Enhanced Dictation" option downloads a large file once, and then you can use dictation offline and with no 30 second limit.
  • Dictation options are found in System Preferences, Dictation & Speech.

Speakable items is for issuing commands like "reply to sender" while you're reading a mail message.

  • Speakable items is turned on in System Preferences, Accessibility.
  • You can also change the "Listening key" - the key which indicates that you are about to issue a command. Some VoiceOver users suggest changing the listening key from Escape to something like F10.
  • You have to calibrate your microphone the first time you use Speakable items, by reading a small number of phrases like "what time is it". Successful completion of each phrase is indicated with a noise and a visible flash.
  • Use headphones when calibrating, so that VoiceOver output does not interfere with your speech.

Smartphones and tablets

Mobile devices have a microphone built in, and this is good enough for voice control. You may prefer to use a headset or earphone with a separate microphone, especially if you use a screen reader, in order that your speech and the screen reader output are kept separate.

Training is not needed for the simpler voice recognition used on a mobile device.

By default, you need an Internet connection because the intensive computing needed for voice recognition is not done on your device. What you say is sent to a more powerful computer online where the recognition is carried out, and the resulting text is sent back to your device. More recent versions of the software on your device may include an option for offline dictation, but be warned that switching this option on will require a large one-time download.


On iOS and newer Android devices, the on-screen keyboard has a key with a microphone symbol to the left of the Space key. Tapping this key starts dictation mode. You simply say what you want and then tap the screen to indicate you've finished.

On iOS devices, a double tap with two fingers in an edit area also starts and stops dictation.

On BlackBerry devices the microphone key may be on either side of the Space key. Press and hold it until you hear a beep, then dictate your text.

On Windows Phone devices, the microphone key is beneath the right hand end of the Space key. Tap it to start dictation mode, wait for the beep and then dictate your text. Windows phone devices allow dictation in only a few apps - Messaging, Mail and OneNote, but not Calendar, Internet, People or Office applications.

In each case, your device will try to work out what you've said and enter it into the active app.

For iOS and Android, you have to vocalise capitalisation and punctuation, so you would say "cap yes exclamation mark" to get "Yes!". The first word of a sentence and proper names may be capitalised automatically. Blackberry devices have an "auto-punctuation" setting which is on by default. Windows phones always add basic capitalisation and punctuation.

Device control and internet searching

Siri comes with all recent iOS devices, and allows for device control and Internet searching. You start it by holding down the Home button.

Voice Search is the Android equivalent of Siri. Voice Search appears by default near the top of the home screen of many Android devices, to the right of the Search button.

Anecdotal feedback is that Siri is better for voice control - you can even use it to start VoiceOver on iOS 7 - and Voice Search is better for Internet searching.

Windows phones' speech recognition has to be turned on by going to Settings, Speech and checking the "Enable speech recognition service". You can then initiate speech recognition by holding down the Start button. Audio confirmation of device control operations is provided. For anything that is not recognised as a control operation, an Internet search is conducted. The results are listed on screen but not read out.

The voice control facility in BlackBerry 10 is switched on by holding down the Mute button. Basic audio confirmation is given for Internet searches and device control, but it is not enough to use the facility if you are blind.


Voice control is a fast-evolving technology, but here are a few online resources: