Portable Definition File (PDF) is a page description language.
PDF History
It was released in the early 1990s, as a proprietary technology.
It succeeded as it was the first PDL that solved the print fidelity issue by preserving the exat visual layout (fonts, vector graphics, raster images, and page geometry) independent of device, OS or application.
It was released as an open ISO standard in 2008.
PDF Characteristics
Early versions PDF files were human-readable.
Modern PDF files are only partly human-readable and mostly binary files, because they contain compression.
Advantages of PDF (or page description languages):
- Keeps layout/design/printability among different platforms
- Hinders editability (if required)
Disadvantages of PDF formats (or page description languages):
- Reduces processability (compared to a markup language)
- Reduces accessibility
- Reduces reusability
- Doesn’t ensure integrity natively
Linearized PDFs are optimized for web viewing.
A Tagged PDF is a structure that allow to store PDF information in tags. The root of the structure tree in a tagged PDF is StructTreeRoot. It is defined in ISO 32000. It is used on both PDF/A and PDF/UA.
PDF Compression
PDF compression techniques:
- FlateDecode (zlib/deflate) – Common for text and streams.
- LZW (Lempel-Ziv-Welch) – Older, less common now.
- DCTDecode (JPEG) – Used for image data.
- CCITTFaxDecode – Used for monochrome images (e.g., scanned documents).
- JBIG2Decode – Used for bilevel image compression.
Most text, fonts, and metadata are compressed using FlateDecode, which is reversible.
Types of PDF
PDF types featured on this post:
- PDF/A
- PDF/UA
- PDF/E
A PDF can complement different PDF types if it meets all the requirements. For example, a PDF file can be both PDF/A and PDF/UA.
PDF/A
PDF/A is focused on long-time preservation of files.
PDF/A is defined through the family of standards ISO 19005.
It uses Tagged PDF.
It consists of different generations:
- A-1 – ISO 19005-1 – 1.4
- A-2 – ISO 19005-2 – 1.7
- A-3 – ISO 19005-3 – 1.7
- A-4 – ISO 19005-4 – 2.0
Each version or generation may have different conformance levels:
- a = Accessible / fully tagged
- b = Basic visual fidelity only
- u = Unicode support (text reproducible)
PDF/A-4 simplified the conformance levels.
PDF/A-4 conformance levels:
- PDF/A-4 (base): Simplified baseline
- PDF/A-4f: allows file attachments
- PDF/A-4e: for engineering (CAD) use
- PDF/A-4u: Unicode-based accessibility
PDF/UA
PDF Universal Access (PDF/UA) is the global standard for PDF accessibility.
It is defined in ISO 14289.
It uses Tagged PDF.
Versión:
- PDF/UA-2
The PDF/UA World (probably the same as the former PDF/UA Foundation) is an organization to promote the use of PDF/UA. It provides a guide to use PDF/UA.
PDF/E
PDF/E is used on engineering workflows.
PDF Tagging
PDFs allows tagging, as described in the PDF 2.0 specifications standardized in ISO 32000-2.
PDF tagging is a pre-requisite to achieve accessibility.
Some of the PDF/A conformance levels, such as PDF/A-1a, PDF/A-2a, PDF/A-3a and PDF/UA, mandates tagging through an structure tree. PDF/A-4u would be the closest of the PDF/A family to the mentioned conformance levels, but in this case tagging is only recommended. PDF/UA remains as the only family that mandates it.
PDF Accessibility
One of the main drawbacks of PDF is that it reduces it accessibility and re-usability.
A possible solution is enabling the PDF accessibility using the PDF/UA format, that requires PDFs tagging.
Another possible solution is using an alternative markup language (such as HTML) with the option to be exported to PDF, or offering a processable format (described as a markup language, e.g HTML) and a printable format (described as a page description language, e.g. PDF) independently. It must be checked that the converter keeps the original tagging structure within the PDF converters.
One the PDF is created, PDF accessibility must be checked.
PDF Security
You can read this post about PDF security.
PDF Manipulation
How to generate a PDF file from a script
How to print multiple ranges of PDF files in macOS X
PDF Tools
PDF Reading Tools
Sumatra PDF
Sumatra PDF is a PDF reader for desktop platform.
ReadEra
ReadEra is a PDF reader for mobile platforms.
Moon+ Reader
Moon+ Reader is a PDF reader for mobile platforms.
PDF Manipulation Tools
PDF manipulation tools
- Stirling PDF
- PDFsam
- PDF Toolkit
Stirling PDF
Stirling PDF
PDFsam
PDFsam (from PDF split and merge) is a family of tools.
PDFsam Basic is FOSS.
PDFsam Enhanced is proprietary.
PDF Toolkit
PDF toolkit (pdftk) is a CLI application to manipulate PDF data.
It can dumps raw PDF data.
PDF Compression Tools
PDF compression tools:
- qpdf
- MuPDF
qpdf is a library to decompress, modify, and re-compress PDFs.
MuPDF contains the CLI mutool command, that is a library to uncompress, modify, and recompress PDFs.
PDF OCR Tools
OCRmyPDF
OCRmyPDF is an application to read PDFs. It leverages the tesseract OCR library.
OCRmyPDF official documentation
Accesibility Checker Tools
PDF accessibility checkers:
- veraPDF
- PDF Accessibility Checker (PAC)
- Adobe Acrobat Pro’s accessibility checker
- AXE PDF
veraPDF is a FOSS validation tool for PDF/A that also supports PDF/UA checks.
It belongs to the Open Preservation Foundation.
PDF Accessibility Checker (PAC) is a PDF accessibility checker. It is closed-source freeware, available as a desktop application for Windows.
It is developed by the Swiss foundation Access for All (Zugang für alle) / via the company axes4 GmbH.
The project is funded mainly by Germanic public organizations.
PAC versions;
- 2021
- 2024
- 2026
Adobe Acrobat Pro’s accessibility checker is a paid proprietary solution.
AXE PDF is a paid proprietary PDF accessibility checker.
PDF Security Tools
pdfid is a PDF scanner tool and analyzer. It is used in security. It is written in Python.
PDF Generation Tools
This section is about tools to generate a PDF from a markup language.
- PDF-LIB
- pdfme
- clawPDF
PDF-LIB is a library written in JavaScript.
pdfme is an application written in TypeScript.
clawPDF allows to create a PDF document, including PDF-A1b, A2b and A3b formats.
PDF Markup Recovery Tools
This section is about tools to reconstruct the source descriptive markup language document from a PDF.
You can find a post with a list of descriptive markup languages.
PDF Conversion Tools
This section is about tools to convert a PDF to a different output.
- Calibre
- pdf2epubEX
- overcuriosity’s pdf2epub