PDF Files
pdfinfo- WebsiteA command-line tool to get a basic synopsis of what the PDF file is.
# Extract all javascript from a PDF file pdfinfo -js input.pdfpdf-parser- WebsiteParse a PDF file and extract the objects.
# Extract stream from object 77 python pdf-parser.py -o 77 -f -d out.txt input.pdfqpdf- GitHubA command-line tool to manipulate PDF files. Can extract embedded files.
pdfcrack- WebsiteA command-line tool to recover a password from a PDF file. Supports dictionary, wordlists, and bruteforce.
pdfimages- WebsiteA command-line tool, the first thing to reach for when given a PDF file. It extracts the images stored in a PDF file, but it needs the name of an output directory (that it will create for) to place the found images.
pdfdetach- WebsiteA command-line tool to extract files out of a PDF file.
pdftotext- WebsiteExtract the text of a PDF. By default it only reads what is inside the page, so text drawn outside the visible area (negative coordinates, or beyond the page
MediaBox) stays hidden. Pass an oversized crop box with negative offsets to capture it:# -x/-y is the top-left corner, -W/-H the width/height of the crop area (in points) pdftotext -x -200 -y -200 -W 2000 -H 2000 input.pdf out.txt