Skip to content
PDF Files

PDF Files

  • pdfinfo - Website

    A command-line tool to get a basic synopsis of what the PDF file is.

    # Extract all javascript from a PDF file
    pdfinfo -js input.pdf
  • pdf-parser - Website

    Parse a PDF file and extract the objects.

    # Extract stream from object 77
    python pdf-parser.py -o 77 -f -d out.txt input.pdf
  • qpdf - GitHub

    A command-line tool to manipulate PDF files. Can extract embedded files.

  • pdfcrack - Website

    A command-line tool to recover a password from a PDF file. Supports dictionary, wordlists, and bruteforce.

  • pdfimages - Website

    A command-line tool, the first thing to reach for when given a PDF file. It extracts the images stored in a PDF file, but it needs the name of an output directory (that it will create for) to place the found images.

  • pdfdetach - Website

    A command-line tool to extract files out of a PDF file.

  • pdftotext - Website

    Extract the text of a PDF. By default it only reads what is inside the page, so text drawn outside the visible area (negative coordinates, or beyond the page MediaBox) stays hidden. Pass an oversized crop box with negative offsets to capture it:

    # -x/-y is the top-left corner, -W/-H the width/height of the crop area (in points)
    pdftotext -x -200 -y -200 -W 2000 -H 2000 input.pdf out.txt