Portable Definition File (PDF) is a page description language.
PDF Characteristics
Early versions PDF files were human-readable.
Modern PDF files are only partly human-readable and mostly binary files, because they contain compression.
Advantages of PDF (or page description languages):
- Keeps layout/design/printability among different platforms
- Hinders editability (if required)
Disadvantages of PDF formats (or page description languages):
- Reduces processability (compared to a markup language)
- Reduces accessibility
- Reduces reusability
- Doesn’t ensure integrity natively
Linearized PDFs are optimized for web viewing.
PDF Compression
PDF compression techniques:
- FlateDecode (zlib/deflate) – Common for text and streams.
- LZW (Lempel-Ziv-Welch) – Older, less common now.
- DCTDecode (JPEG) – Used for image data.
- CCITTFaxDecode – Used for monochrome images (e.g., scanned documents).
- JBIG2Decode – Used for bilevel image compression.
Most text, fonts, and metadata are compressed using FlateDecode, which is reversible.
Types of PDF
PDF types featured on this post:
- PDF/A
- PDF/UA
- PDF/E
PDF/A
PDF/A is focused on long-time preservation of files.
PDF/A is defined through the standard ISO 19005-1.
PDF/UA
PDF/UA is the global standard for PDF accessibility.
PDF/E
PDF/E is used on engineering workflows.
PDF Accessibility
One of the main drawbacks of PDF is that it reduces it accessibility and reusability.
A possible solution is enhancing the PDF accessibility by using tagged PDFs or the PDF/A format.
Another possible solution is using an alternative markup language (such as HTML) with the option to be exported to PDF, or offering a processable format (described as a markup language, e.g HTML) and a printable format (described as a page description language, e.g. PDF) independently.
PDF Security
You can read this post about PDF security.
PDF Manipulation
How to generate a PDF file from a script
How to print multiple ranges of PDF files in macOS X
PDF Tools
PDF Reading Tools
Sumatra PDF
Sumatra PDF is a PDF reader for desktop platform.
ReadEra
ReadEra is a PDF reader for mobile platforms.
Moon+ Reader
Moon+ Reader is a PDF reader for mobile platforms.
PDF Manipulation Tools
PDF manipulation tools
- Stirling PDF
- PDFsam
- PDF Toolkit
Stirling PDF
Stirling PDF
PDFsam
PDFsam (from PDF split and merge) is a family of tools.
PDFsam Basic is FOSS.
PDFsam Enhanced is proprietary.
PDF Toolkit
PDF toolkit (pdftk) is a CLI application to manipulate PDF data.
It can dumps raw PDF data.
PDF Compression Tools
PDF compression tools:
- qpdf
- MuPDF
qpdf is a library to decompress, modify, and re-compress PDFs.
MuPDF contains the CLI mutool command, that is a library to uncompress, modify, and recompress PDFs.
PDF OCR Tools
OCRmyPDF
OCRmyPDF is an application to read PDFs. It leverages the tesseract OCR library.
OCRmyPDF official documentation
PDF Security Tools
pdfid is a PDF scanner tool and analyzer. It is used in security. It is written in Python.