PDFs get a rough press when it comes to accessibility and understandably so as most PDFs on the web today are not accessible. I thought I’d turn the spotlight on the much maligned thorn in many a web site owners side, and look at some of the reasons why PDFs are inaccessible. What follows is a list of some of reasons behind why PDFs suck that are not about the technology itself but how we (the web designer, the content author, the content commissioner, the manager, the policy maker) use it and what we can do to start changing PDFs on the web.
1. Content authors
There’s really no excuse to not create accessible PDF as Adobe have built in tools to support the creation of accessible content and given us guidance on how to make the most of those tools (see the resources at the end of this post). There are other hindrances (as listed below) and arguably the overall responsibility may lie with the organisation you work for but this doesn’t mean that we have a devolved responsibility for the work we produce. If you are part of the chain that produces PDF in your organisation then you have a responsibility to do what you can to make the PDF accessible. Some ways of doing this include:
- Follow Adobe’s accessible PDF guidelines (a link to which is in the resources below).
- Ensure third party content providers are made aware that they must create accessible content and write this into their contract.
- Ensure you organisation supports you in your efforts (read on to point 2).
2. Organisational support
First there is the issue of having the skills to make accessible PDFs but often, before you even get that far, it’s all too common to find that the systems and processes within an organisation are simply not in place to support people creating accessible PDF. People may not know they a) have to do it, b) how to do it, c) have the appropriate training and support to do it and d) have time built into their workflow.
It’s no good saying to content authors "Now go forth and make accessible PDFs" and leave it at that. Systems and processes need to be put in place to ensure that what is required can be achieved. This can be done by:
- Training content authors; investing in them is ultimately an investment in your site. Ensure they have access to up to date resources and support. Join forums, attend workshops and most importantly pool knowledge.
- Document what needs to be done to create an accessible PDF. Create checklists and guidelines in-house if necessary.
- Ensure people are clear of what is expected of them: write an Accessible PDF Policy (see number 3)
- Monitor content and prevent anything being uploaded to the site that has not been quality assured.
- Have creating accessible PDFs written into objectives, sounds harsh but it works, it gives people a motivation to do it.
3. PDF policy
We talk a lot about how a website should have an Accessibility Policy to help guide people who maintain the website when making decisions on what technologies to use, maintenance, testing, compliance and so on. A key part of an Accessibility Policy is a section on using PDF (not to mention other areas that are also overlooked such as software accessibility, e-document design and procurement policies). If there is no clear steer to PDF content authors as to what is expected of them then there is less of a chance that they will fulfil the requirement.
So what sort of information should go in an Accessible PDF Policy?
- Outline of what version of Adobe Acrobat should be used. The latest version is recommended but note that all versions before version 7 lack real accessibility support.
- Outline a level of compliance or goal. This could be based on WCAG 2.0 which is designed to be technology agnostic and therefore relevant to non-W3C technologies.
- Outline what must be done if a PDF can’t be made accessible, i.e. provide a Word or HTML alternative and consider removing the PDF altogether.
- Outline what type of content is permissible in PDF - it’s not appropriate to put some types on content in PDF (see point 4).
- Outline a testing plan: who checks the PDF before it is published
- Outline a requirements document for third party providers of PDF.
4. Content
People seem to use PDF as a fallback format for content that really shouldn’t be in PDF. There are types of content that arguably can be in PDFs and others that a PDF shouldn’t even come near.
Maps and forms are a particular bug bear of mine. Maps are totally inaccessible in PDF and impossible to make accessible unless you write a text description. That being the case why not instead have an HTML page that includes, for example, the address of the place and basic directions of how to get there by car, train, tube, bus and so on. You can then always link to the map as a back up/visual aid.
The second example, forms, should really be avoided unless it is a form that is intended to be printed and signed. Forms are hard to mark up accessibly in PDFs, extremely difficult to navigate, understand and fill in if you are a screen reader user. Ideally forms should always be presented in HTML. If it is a form that needs signing then have it generated in a PDF after the user has filled it in on the web page and hit "save" i.e. an HTML form that converts to PDF once completed and can then be printed and signed.
Ways of counteracting un-necessary PDF content:
- Ask yourself when creating new PDFs "Should this content really be in HTML or Word instead?"
- Avoid putting forms, complex images and key content in PDF.
- Only put content in a PDF is it is already there in the website. For example a downloadable PDF brochure makes sense in a brochure ware website.
- Don’t scan text and put it into a PDF. Use OCR software to translate it to readable text instead.
5. Third party content
This is an issue that affects accessible web development as well as PDFs. If commissioning content from a third party you need to be clear about what you want. This includes not just content but also how the content is structured and the requirement that it meets the accessibility standards of your organisation. Just because it is not content generated by you or your organisation it doesn’t mean you are not responsible for it; if it is published on your site then as far as the user is concerned it is your content.
Ways to ensure accessible PDFs from third parties include:
- Have accessibility written into the contact. Don’t just cite WCAG and a level of compliance but list those checkpoints you want the PDFs to meet and stipulate that the accessibility must be verified by third party consultants if need be (also a great clause to have in a web design contract).
- Share your Accessible PDF policy with the third party. It’s there to guide after all so make them as aware as possible of what you expect.
6. Source documents
PDFs are often generated from other source documents. The document source can have an enormous affect on what the accessibility will be like for the finished product. As the old adage says: rubbish in rubbish out. Make sure that you use a format that can support accessibility and can have structure added.
- Create tagged PDF from a structured word document (structured using Word Styles and heading levels).
- Ensure tables in the source Word document are created using the proper table grid formatting option and that no merged cells are used.
- Ensure images have concise Alt text assigned to them.
- Ensure all text is typed in directly from the keyboard and no text is contained within Text Boxes or images.
Things to not do are:
- Create PDF from Word documents using software that cannot produce tagged PDF (at time of writing this is most software other than Adobe’s)
- Create PDFs using Quark Xpress or other page layout software and without taking the PDF into Acrobat and tagging it manually.
- Create PDF from scanned images without taking the file into Acrobat and running OCR process and then tagging it manually.
7. Legacy PDF
All of the above is all well and good but what about all the PDFs out there today that don’t even give a nod to accessibility? There are literally billions of PDF on the web; some websites make heavy use of PDF to the extent where they represent a core format for the site (think about government websites for example).
It would be unreasonable, and impossible, to go back and retrofit every PDF on the web. So how do we deal with legacy PDF?
- Look at fixing key PDFs only by reviewing your web stats and seeing what the most popular PDFs are.
- Look at fixing PDFs that have been published in the last year, two years.
- Look at fixing PDFs that you know are important for people and key to your site for example annual reports.
Conclusion
So this is really a call to action. It’s time to stop shaking the proverbial fist at PDFs the technology and start taking matters in our own hands. Adobe have worked hard to get accessibility support built into Acrobat (and very soon there may be more resources to help us publish accessible PDF - watch this space), so it’s down to us to start implemening the processes to support the creation of accessible PDF.
Resources
- Defining PDF accessibility : recently updated resources from WebAim, and very good they are too.
- Accessing PDFs using Jaws, a screen readers guide: an practical guide on navigating PDF if you are a screen reader use from RNIB’s Hugh Huddy. It also outlines how to recognise when problems are to do with PDFs and not the screen reader.
- Creating accessible PDF using Adobe Acrobat 7.0 (10.2MB, PDF) : Adobe’s ‘how to’ guide on creating accessible PDF.
- Making and generating accessible PDF: from Bruce Lawson.
- Use of PDF in accessible documents: An outline from the WCAG Samurai Errata.
- Preparing Microsoft Word documents to create accessible PDFs (PDF, 1.23 MB): tips of creating accessible PDF using Word from Adobe.
- Accessing PDF Documents with Assistive Technology: A Screen Reader User’s Guide: a screen reader user guide from Adobe.
Other things that suck
- Standards suck!: Standards Suck hosts podcasts made by Anne van Kesteren, Marcos Caceres, and Lachlan Hunt about Web standards.
- Captioning sucks!: Looks at talking about just how lousy captioning really is and alerting people to the fact that there is a solution on the horizon – the Open & Closed Project.
Andy Mabbett | 30/06/2008 at 12:51 | Permalink
This is old, but still pertinent:
Jakob Nielsen’s Alertbox, July 14, 2003:
PDF: Unfit for Human Consumption
http://www.useit.com/alertbox/20030714.html
simon gray | 30/06/2008 at 15:09 | Permalink
It’s all very well suggesting people read the Adobe guidelines on creating accessible .pdfs - but when that document itself weighs in at 115 pages, who is expected to bother ? I’m quite sure there isn’t really a whole 115 pages worth of actual necessary information needed to digest in order to do the job properly…
Joe Clark | 30/06/2008 at 18:58 | Permalink
I really don’t think it’s accurate to state, rather baldly, that “most PDFs… are inaccessible.” You have to separate the PDF from the viewing application. Recent versions of Acrobat are more or less capable of making untagged text-only PDFs accessible as soon as you open and start reading them. This example should not be minimized, as it is a favourite tactic of e.g. press-release writers to issue a few hundred words of barely-marked-up text in PDF.
In other words, the software can overcome the inaccessibility of many documents. It isn’t going to help you with images, which require intelligently-written alt texts, but even in some cases a document can be functionally accessible without alt texts (e.g., a press release with a company logo at the top that also mentions the company name in the press release).
I think you should expand your list of links a bit. Among other things, on the WCAG Samurai site there is now a very solid list of PDF formats that should never be posted on the Web as PDF. You need to be a bit less wishy-washy on this point. It isn’t true that people should ask themselves if a document should be PDF; for online distribution, many documents should never be PDF (or, more accurately, should never be available solely as PDF).
Also: How come nobody from RNIB is ever on the PDF/Universal Accessibility phone calls? We’re almost done writing V1.0 of our spec, you know. And it’s pretty good, if I do say so myself.
Hugh Huddy | 01/07/2008 at 10:12 | Permalink
there are two reasons why RNIB chose not to attend the PDF UA discussions:
1. as a not-for-profit we focus resource into areas where they’re really needed and the PDF UA working group is already a strong mix of experts.
2. related to the point above about the 115 page long Adobe guidelines, we have fed back to the relevant people on several occasions that these guidelines are swollen by workarounds to compensate for what information producers tell us are inadequacies in the PDF authoring tools, so we have invested the time that PDF UA has been running in understanding these issues at ground level.
I believe the ‘PDFs suck’ situation whether you fully or partly agree with this statement cannot be recovered as long as the actual people out there in the field are faced with extensive guidelines and confusion over using authoring tools no matter how well PDF UA specifies and defines accessibility.
I do think we as accessibility experts have a primary duty to understand what it’s like at ground level because that’s where accessibility is failing.
I am sure Henny’s posting here will help content authors understand their responsibilities and increase demand for technologies that actually work!
Henny | 03/07/2008 at 16:30 | Permalink
Thanks for your comments folks, all useful stuff. I’ve added in a couple of additional links to the resources section that provide further guidance on creating accessible PDF.
Joe Clark | 05/07/2008 at 15:14 | Permalink
Simple documents are easy to make accessible in PDF *up front*. So are several kinds of complex documents if you use modern software like InDesign or Framemaker (yes, from Adobe) that can produce tagged PDFs automatically. I have had no trouble creating semantic multicolumn PDFs – with images, no less – that fly through the Acrobat accessibility checker without problems.
Now, if you’re creating a document with complex tables, that’s another story, but PDF opponents need to be honest about the fact that comples tables are always difficult to make accessible. Footnotes and so on? Well, at least there are actual defined tags for them in PDF, unlike in HTML. Mathematics? Tricky right now, but MathML will be required in PDF/UA.
What isn’t being honestly discussed here is the fact that most of the “problems” people face with PDF come about during retrofits. Somebody hands you a crap PDF and you’re expected to fix it. Well, of course that’s going to be difficult. It isn’t like HTML, where you can engage an orbital-nuke scenario and just remove all the markup and start over. PDFs are databases of nearly anything you want, and trying to untangle a database of objects isn’t going to be easy.
If people want a larger number of accessible PDFs, then they’re going to have to learn to use stylesheets in MS Word and, finally, for the love of God, get with the program and quit using Quark Xpress. Do those two things and you solve a large part of the new-PDF problem. The retrofit-PDF problem is rather more difficult to solve. Maybe RNIB should work harder on the relatively easy win of making new PDFs accessible.
Henny | 09/07/2008 at 17:29 | Permalink
I agree with you Joe in that the quality of what goes in, and the tool used, has a huge impact on the quality of accessible PDF that is produced. There will always be times when retrofitting a PDF isn’t the way to go and especially so with complex ones. As with HTML an informed decision needs to be taken to establish if in fact PDF re-design is needed. When it comes to creating accessible PDF using MS Word style sheets is definitely a technique that we advise whilst avoiding coming anywhere near Quark which is really not our friend here. As I’m sure you yourself know this in itself is a challenge when the publishing industry have invested heavily in software’s like Quark and designers are only familiar with it. This is why I believe that making PDFs more accessible is as much a culture and process change as it is an understanding of what must be done to create an accessible PDF.
We do already do a lot of work both with Adobe and with various groups here in the UK on promoting the creation of accessible PDFs. Hugh Huddy especially has represented many groups with an interest in accessible PDF to Adobe. He’s worked to engage with disabled people, content authors and the tool makers to gain an holistic understanding of the challenge and this work continues.
Andrew Kirkpatrick | 11/07/2008 at 13:52 | Permalink
Great post Henny - I’d like to chime in with a couple of things:
1) We have a 2-page reference card for creating accessible PDF from Microsoft Word. This doesn’t cover all topics on accessibility for a PDF document, but highlights 5-6 very important things that anyone can easily do in a Word document that will have immediate impact on the accessibility of the PDF file produced. If you follow the advice on this sheet most documents that most people produce in Word will be highly accessible. The reference card is at http://blogs.adobe.com/accessibility/assets/WordToPDFReferenceCard_v1.pdf
2) We also produced a screen reader user’s guide for accessing PDF files. It focuses on what users need to do when encountering different types of PDF files (untagged, scanned, a form, and well-tagged). This is available at http://www.adobe.com/accessibility/products/reader/ where you can also download Adobe Reader 9.
AWK
Andrew Kirkpatrick | 11/07/2008 at 13:57 | Permalink
I’ll also add that our screen reader user’s guide is currently available as an accessible PDF file, but it is also being produced in HTML (doesn’t seem too fair to teach people how to use PDF files in a PDF file…).
Henny | 11/07/2008 at 14:27 | Permalink
That’s great, thanks Andrew. I’ve also updated the resources list in the post as well.
Texx Smith | 28/07/2008 at 20:13 | Permalink
One thing that is hardly ever mentioned in “PDF’s suck” discussion is that .pdf’s crash lots of older computers and bog down even newer modern computers.
There is almost NEVER a need for a pdf!