CAPTCHA - if your name’s not down you’re not coming in.

CAPTCHA’s and accessibility have been hotly debated in newsgroups, the press and also recently brought up on this blog. Many people are feeling marginalised by their use and frustrated at not being able to access the online services they want. Facebook, for example, has in the past made heavy use of CAPTCHA even once you’re logged in (and it is rumoured to be reinstated). Google also uses CAPTCHA although they have looked into providing audio alternatives to the visual CAPTCHA (at the time of writing however two people have reported that the audio has not been working). These are two of the fastest growing Internet companies on the web today who are setting precedents of how web pages are delivered.

This article looks at what CAPTCHA is, what it is for, problems and possible solutions.

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” which to the rest of us means those images of text that you have to read and enter into a text field in order to identify yourself as a human, and not spam, when registering for a website.

The problem with CAPTCHA’s are that they are normally a mismatch of letters and numbers, often visually unclear and hard to read. This is in fact the point of CAPTCHA’s as they have to be visually distorted and can’t have an alt attribute, which is relied on by screen readers and text based browser users, otherwise they become machine readable and therefore accessible to spammers. In essence CAPTCHA’S are designed to prevent spammers and bots getting in to the system while proving that you are human; except it doesn’t always make you feel that way because they’re notoriously difficult to use.

So what we have here is a security solution that succeeds in not just locking out spam but also many of us who struggle to get any sense out of the image. A classic example of when accessibility and security meet head on.

The problems

The CAPTCHA below is a standard example you’ll find on the web today. While it may be accessible for some it invariably knocks screen reader users and people with poor eyesight off the guest list not to mention people with reading or cognitive problems. In fact I don’t know anyone who hasn’t struggled at one time or another to figure out the text written on an image used for a CAPTCHA. As the saying goes here in the UK when you’re queuing to get into a nightclub “If your name’s not down you’re not coming in”. CAPTCHA’s are the online equivalent of the unfriendly bouncer working the doors.

A basic CAPTCHA will consist of an image of text and a form field to write the text into followed by a submit button

That said there are two alternatives provided in the above example designed to accommodate people who have problems accessing the images as they are:

1. A reload button: To the top right of the input field is a button to reload the image with a new image of text which is useful if you can’t fathom out the original (is it “slept” and “small” or “slept” and “sinall”?). It’s no guarantee that the next image of text that is presented is going to be any more readable however. Even less so if you have sight problems.

2. Audio CAPTCHA: There is a button you can click so that you can hear a recording of the letters presented in the image. While it is good to see efforts made to provide an audio alternative to the visual CAPTCHA they do tend to come with their own set of problems. Many audio CAPTCHA’S can be difficult to hear even to those with good hearing due to background noise and distorted pronunciation (distorted audio, like the distorted text seen on a visual CAPTCHA is necessary in order to prevent it being understood by automated agents). Furthermore if you are deaf/blind or do not have audio enabled in your hardware you wont be able to access the audio.

As an aside, an additional couple of problems I encountered with this particular example of a CAPTCHA was that the links to the reload and audio buttons are not keyboard accessible meaning that a screen reader user and any other keyboard only user can’t reach these options. And to wrap it up there are no alternatives for if JavaScript is switched off or not supported.

Ways around the inaccessibility of CAPTCHA for humans are debatable. On the one hand sites want to use CAPTCHA’s to mitigate against security breaches but they do tend to lock out legitimate registrants. The World Wide Web Consortium (W3C) have written a paper called The inaccessibility of CAPTCHA and Google have also researched audio CAPTCHA’s but opinion in accessibility circles seems to be that they remain problematic even with audio alternatives. In fact the W3C in it’s paper states that on top of it being a viable option for spammers to hire humans to hack through CAPTCHA’s “many of the systems can be defeated by computers with between 88% and 100% accuracy, using optical character recognition”.

The use of audio CAPTCHA’s as an alternative to visual CAPTCHA’s is just one possible solution however. Others include:

1. Logic questions and trivia: Adding in a simple question for the user to answer such as “Is ice hot or cold?” can provide an alternative to visual and audio CAPTCHA’s. There is a danger however that users with cognitive problems, dyslexia or those browsing in a foreign language can become marginalised. In addition to this a website would have to have a vast bank of questions that it could randomly ask if it is truly going to be able to outwit spammers furthermore it is possible for humans to be used to hack the system.

2. Restricting use on an account: Websites can have it so that if input details are added incorrectly more than say three times, the account can be locked down for a period of time. This would prevent repeat attacks from spammers trying to work out ways in. The problem here is that many of us may input details incorrectly when logging into accounts especially if you can’t see the page, have trouble reading or understanding the page.

Possible solutions

So what are the alternatives given that there is an undeniable need for large sites to mitigate against attacks from malicious spammers?

1. Non-interactive checks: These are essentially checks that the website itself carries out therefore removing the responsibility from the user. This can include spamming filters that filter out blacklisted words. While spam filters may experience false negatives from time to time, they can be as effective as CAPTCHA and create no accessibility barriers for users.

In addition to spamming filters heuristics can also be used to predict the behaviour of spam bots and lock them out. Robotic users can be detected based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data.

2. Federated Network Systems: This is where a user can create an account, set his or her preferences, payment data and so on and have that data recognised across all sites that use the same service. This means that a user need only sign-in once and will have access to all affiliated services within the network. A little bit like Microsoft Passport. The issue here is that while that is all well and good a user will need to register that one first time so CAPTCHA’s need to be avoided there. Similar to Passports are Certificates that are sent to the site identifying you as you. This does come with a drawback as you may be forced to flag to the site that you have a disability which raises other concerns about privacy.

3. Phone numbers: A non technical way around CAPTCHA’s is to offer a number for people to call if they can’t get past the login. While this acts as a solution for some this again is not a failsafe way to ensure access. Not everyone will have a phone to hand (if you’re on dial-up for example) and it isn’t really fair to make users pay for the privilege. Additionally this can create a lot of additional costs for the website owner. Ideally the number should be free phone and it will need to be managed 24 hours a day by a team trained in how to process queries.

Conclusion

It really seems to me that there is no catch all accessible alternative to CAPTCHA that can be secured from spammers. As we’ve seen some sites make efforts to incorporate an audio CAPTCHA but this isn’t sufficient, even if a logic question were thrown into the mix, (putting aside the fact that this places a lot of development work on the website owner to provide all three options). In addition to this it seems that it is possible to break CAPTCHA’s either using automated techniques or humans. Despite all that however it certainly seems that website owners are choosing security over accessibility.

Security is an essential part of any website, especially large sites holding valuable data and it is not something that can be compromised, however I personally don’t think the burden for protecting the website should be put on the user in the way that CAPTCHA does. It is up to the website owner to find an alternative way of protecting their web presence rather than doing it at the cost of the very people who are trying to use the site. As such non interactive measures such as spam filters and heuristics should be investigated. Surely, in light of W3C’s estimation that computers can hack CAPTCHA’s with 88% to 100% accuracy there is more sense in using unobtrusive methods to combat spam?

There’s nothing more off putting that deciding to sign up for something and then finding that after all the friendly marketing speak the doors are shut. After all, for every airline, ticketing or retail site that has CAPTCHA there’ll be another that doesn’t which will be a far more attractive option.