Why PDFs suck!

PDFs get a rough press when it comes to accessibility and understandably so as most PDFs on the web today are not accessible. I thought I’d turn the spotlight on the much maligned thorn in many a web site owners side, and look at some of the reasons why PDFs are inaccessible. What follows is a list of some of reasons behind why PDFs suck that are not about the technology itself but how we (the web designer, the content author, the content commissioner, the manager, the policy maker) use it and what we can do to start changing PDFs on the web.

1. Content authors

There’s really no excuse to not create accessible PDF as Adobe have built in tools to support the creation of accessible content and given us guidance on how to make the most of those tools (see the resources at the end of this post). There are other hindrances (as listed below) and arguably the overall responsibility may lie with the organisation you work for but this doesn’t mean that we have a devolved responsibility for the work we produce. If you are part of the chain that produces PDF in your organisation then you have a responsibility to do what you can to make the PDF accessible. Some ways of doing this include:

  • Follow Adobe’s accessible PDF guidelines (a link to which is in the resources below).
  • Ensure third party content providers are made aware that they must create accessible content and write this into their contract.
  • Ensure you organisation supports you in your efforts (read on to point 2).

2. Organisational support

First there is the issue of having the skills to make accessible PDFs but often, before you even get that far, it’s all too common to find that the systems and processes within an organisation are simply not in place to support people creating accessible PDF. People may not know they a) have to do it, b) how to do it, c) have the appropriate training and support to do it and d) have time built into their workflow.

It’s no good saying to content authors "Now go forth and make accessible PDFs" and leave it at that. Systems and processes need to be put in place to ensure that what is required can be achieved. This can be done by:

  • Training content authors; investing in them is ultimately an investment in your site. Ensure they have access to up to date resources and support. Join forums, attend workshops and most importantly pool knowledge.
  • Document what needs to be done to create an accessible PDF. Create checklists and guidelines in-house if necessary.
  • Ensure people are clear of what is expected of them: write an Accessible PDF Policy (see number 3)
  • Monitor content and prevent anything being uploaded to the site that has not been quality assured.
  • Have creating accessible PDFs written into objectives, sounds harsh but it works, it gives people a motivation to do it.

3. PDF policy

We talk a lot about how a website should have an Accessibility Policy to help guide people who maintain the website when making decisions on what technologies to use, maintenance, testing, compliance and so on. A key part of an Accessibility Policy is a section on using PDF (not to mention other areas that are also overlooked such as software accessibility, e-document design and procurement policies). If there is no clear steer to PDF content authors as to what is expected of them then there is less of a chance that they will fulfil the requirement.

So what sort of information should go in an Accessible PDF Policy?

  • Outline of what version of Adobe Acrobat should be used. The latest version is recommended but note that all versions before version 7 lack real accessibility support.
  • Outline a level of compliance or goal. This could be based on WCAG 2.0 which is designed to be technology agnostic and therefore relevant to non-W3C technologies.
  • Outline what must be done if a PDF can’t be made accessible, i.e. provide a Word or HTML alternative and consider removing the PDF altogether.
  • Outline what type of content is permissible in PDF - it’s not appropriate to put some types on content in PDF (see point 4).
  • Outline a testing plan: who checks the PDF before it is published
  • Outline a requirements document for third party providers of PDF.

4. Content

People seem to use PDF as a fallback format for content that really shouldn’t be in PDF. There are types of content that arguably can be in PDFs and others that a PDF shouldn’t even come near.

Maps and forms are a particular bug bear of mine. Maps are totally inaccessible in PDF and impossible to make accessible unless you write a text description. That being the case why not instead have an HTML page that includes, for example, the address of the place and basic directions of how to get there by car, train, tube, bus and so on. You can then always link to the map as a back up/visual aid.

The second example, forms, should really be avoided unless it is a form that is intended to be printed and signed. Forms are hard to mark up accessibly in PDFs, extremely difficult to navigate, understand and fill in if you are a screen reader user. Ideally forms should always be presented in HTML. If it is a form that needs signing then have it generated in a PDF after the user has filled it in on the web page and hit "save" i.e. an HTML form that converts to PDF once completed and can then be printed and signed.

Ways of counteracting un-necessary PDF content:

  • Ask yourself when creating new PDFs "Should this content really be in HTML or Word instead?"
  • Avoid putting forms, complex images and key content in PDF.
  • Only put content in a PDF is it is already there in the website. For example a downloadable PDF brochure makes sense in a brochure ware website.
  • Don’t scan text and put it into a PDF. Use OCR software to translate it to readable text instead.

5. Third party content

This is an issue that affects accessible web development as well as PDFs. If commissioning content from a third party you need to be clear about what you want. This includes not just content but also how the content is structured and the requirement that it meets the accessibility standards of your organisation. Just because it is not content generated by you or your organisation it doesn’t mean you are not responsible for it; if it is published on your site then as far as the user is concerned it is your content.

Ways to ensure accessible PDFs from third parties include:

  • Have accessibility written into the contact. Don’t just cite WCAG and a level of compliance but list those checkpoints you want the PDFs to meet and stipulate that the accessibility must be verified by third party consultants if need be (also a great clause to have in a web design contract).
  • Share your Accessible PDF policy with the third party. It’s there to guide after all so make them as aware as possible of what you expect.

6. Source documents

PDFs are often generated from other source documents. The document source can have an enormous affect on what the accessibility will be like for the finished product. As the old adage says: rubbish in rubbish out. Make sure that you use a format that can support accessibility and can have structure added.

  • Create tagged PDF from a structured word document (structured using Word Styles and heading levels).
  • Ensure tables in the source Word document are created using the proper table grid formatting option and that no merged cells are used.
  • Ensure images have concise Alt text assigned to them.
  • Ensure all text is typed in directly from the keyboard and no text is contained within Text Boxes or images.

Things to not do are:

  • Create PDF from Word documents using software that cannot produce tagged PDF (at time of writing this is most software other than Adobe’s)
  • Create PDFs using Quark Xpress or other page layout software and without taking the PDF into Acrobat and tagging it manually.
  • Create PDF from scanned images without taking the file into Acrobat and running OCR process and then tagging it manually.

7. Legacy PDF

All of the above is all well and good but what about all the PDFs out there today that don’t even give a nod to accessibility? There are literally billions of PDF on the web; some websites make heavy use of PDF to the extent where they represent a core format for the site (think about government websites for example).

It would be unreasonable, and impossible, to go back and retrofit every PDF on the web. So how do we deal with legacy PDF?

  • Look at fixing key PDFs only by reviewing your web stats and seeing what the most popular PDFs are.
  • Look at fixing PDFs that have been published in the last year, two years.
  • Look at fixing PDFs that you know are important for people and key to your site for example annual reports.

Conclusion

So this is really a call to action. It’s time to stop shaking the proverbial fist at PDFs the technology and start taking matters in our own hands. Adobe have worked hard to get accessibility support built into Acrobat (and very soon there may be more resources to help us publish accessible PDF - watch this space), so it’s down to us to start implemening the processes to support the creation of accessible PDF.

Resources

Other things that suck

  • Standards suck!: Standards Suck hosts podcasts made by Anne van Kesteren, Marcos Caceres, and Lachlan Hunt about Web standards.
  • Captioning sucks!: Looks at talking about just how lousy captioning really is and alerting people to the fact that there is a solution on the horizon – the Open & Closed Project.