The voice used by a computer or smartphone screen reader plays a big part in deciding whether you can use that speech system.  How human it sounds is far from the only factor. For learners who use speech systems a lot, other criteria can be just as important. Steve Griffiths explains.

Synthetic voices can be created in two ways. One is to take a recording of a person speaking and break it down into the smallest units that make up the language.
These are called “phonemes”. The second way is to make synthesised versions of the phonemes. In both cases, these sounds are then joined together to create words and phrases.
 
Voices based on samples of real speech sound more human than synthesised voices. There are some other important differences between the two types of voice:
  • Human-sampled speech files can be up to 100 larger. This can be a real issue on a mobile device with limited storage.
  • Smaller synthesised voice files are more responsive and allowing you to work more quickly.
  • As you get used to it, you can speed up a voice and still understand it. Synthesised voices can be run much faster without losing intelligibility, whereas human sampled voices either can’t be speeded up so much, or lose intelligibility when they speed up.
 

Is it right for the job?

The material you’re reading can influence the type of voice you want to use. For a novel you may prefer a voice with inflection – Stephen Fry’s reading of the Harry Potter series is often mentioned as a favourite.
 
Reading a manual or technical document however is a different matter. I once worked with a lawyer who needed every bit of punctuation to be spoken, because the difference between a comma and a full stop could be crucial. Interestingly when I asked if she used a male or female voice, she had to think for a while because she just didn’t notice it that way!
 

A voice like your own?

Some people want a voice that is similar to theirs in gender, age or accent, or at least a voice that they can relate to or aren’t irritated by. Almost all synthesisers offer adult male and female options and child or older voices are less common. You can also get different variants of English, such as US, UK, Irish, Scottish, South African, Caribbean, Indian or Australian – even accents from the west midlands or northern England. There are also Welsh voices, but I’m not aware of Gaelic or Cornish ones!
 

What will it work with?

Voices are generally created for a particular platform, such as a Windows computer or an Android smartphone, although an increasing number of voices are now available on more than one platform. Some voices are licensed for use only with a specific product, while others can be used with any product on a given platform.
 

How “intelligently” can it interpret context?

A voice has to interpret what’s on the page, taking context into account, in order to pronounce correctly a sentence such as “I will read again what I read yesterday”. The amount of such “grammatical parsing” differs, as does how well a voice deals with non-standard spelling like “Loughborough”.
 

To punctuate or not to punctuate

By default most voices pause at the end of a phrase or sentence, rather than speaking the punctuation. But abbreviations are also indicated by a full-stop, which can be confusing, and headings may not include punctuation at all. With dialogue, you have to choose between hearing the quotation marks, which disrupt the dialogue, or not hearing them and not knowing where one speaker finishes and another starts. Being able to decide what punctuation is spoken is the crucial factor. For example it might be important for a child or young person to be able to be able to switch spoken punctuation on for a text with lots of dialogue, or when reading back and checking their own creative writing.
 

Numbers and dates

Numbers are a challenge, as they can be read as single digits, pairs or full numbers with currency being particularly tricky. Try this: “If you ring ext. 12345, you’ll win 12,345 apples, or £123.45”. Some synthesisers are better than others at recognising these constructs.
 
Dates are another area where voices can differ, with US voices tending to read 9/3 as “September third”, whereas a UK voice would say “ninth of March”.
 
“ A voice has to interpret what’s on the page, taking context into account” All of these factors influence whether a young person will find one voice better or worse than another, and how much help they might need to get used to its little ways. Depending on the age and learning style of the young person, the “humanness” of the voice may just not be of much consequence, or be considered more desirable than essential.
 

Commonly used voices

  • Eloquence comes with the JAWS screen reader for Windows PCs, and has recently become available for Android smartphones and tablets. For more information visit the Googleplay website. Unfortunately Eloquence cannot be used with NVDA a free, open source screen reader for Microsoft Windows. Visit the NVDA website for more information.
  • eSpeak is a free voice, used by NVDA and Window-Eyes for Office, and you can use it with others. Visit the eSpeak website for more information.
  • VoiceOver Alex, the voice that comes with Mac OS X, is apparently going to be available on iPhones, iPads, and iPod touch when iOS 8 appears.
  • IVONA offers a good range of voices including Welsh ones and are used in 2013 Amazon Kindle Fires. Visit the Ivona website for more information.
  • Microsoft Windows voices come with Windows 8 and Windows Phone 8.1. You can get them for older Windows computers from Microsoft, but if you use a screen reader already a more accessible route is to visit the Windows Voices website.
  • CereProc do a range of voices for multiple platforms, and make a free voice for Scottish public sector organisations. See the CereProc website for more information.
  • Acapela do a broad range of voices, including a Queen Elizabeth one! See the Acapela website for more information.
 
Steve Griffiths is RNIB’s Digital Accessibility Executive
 

Resources

 
Many of the websites mentioned above include demonstrations of the voices. You can also find several voice samples all in one place on RNIB’s website
 
Download RNIB’s guide to carrying out a technology assessment for children and young people in education from the Technology in Education page. It is packed with information and advice for anyone involved in deciding what equipment to provide for a young person with sight loss.