PDF Files

pdf-parser - Website

Parse a PDF file and extract the objects.

# Extract stream from object 77
python pdf-parser.py -o 77 -f -d out.txt input.pdf

qpdf - GitHub
A command-line tool to manipulate PDF files. Can extract embedded files.
pdfcrack - Website
A command-line tool to recover a password from a PDF file. Supports dictionary, wordlists, and bruteforce.
pdfimages - Website
A command-line tool, the first thing to reach for when given a PDF file. It extracts the images stored in a PDF file, but it needs the name of an output directory (that it will create for) to place the found images.
pdfdetach - Website
A command-line tool to extract files out of a PDF file.
pdftotext - Website
Extract the text of a PDF. By default it only reads what is inside the page, so text drawn outside the visible area (negative coordinates, or beyond the page MediaBox) stays hidden. Pass an oversized crop box with negative offsets to capture it:
```
# -x/-y is the top-left corner, -W/-H the width/height of the crop area (in points)
pdftotext -x -200 -y -200 -W 2000 -H 2000 input.pdf out.txt
```