File Scanning

File Scanning

File scanning is the process of analyzing a, potentially large, file to find information about it. This can be useful to find hidden data, or to simply find the data type and structure of a file.

Tools

  • file

    Deduce the file type from the headers.

  • binwalk

    Look for embedded files in other files.

    binwalk <file>            # List embedded files
    binwalk -e <file>         # Extract embedded files
    binwalk --dd=".*" <file>  # Extract all embedded files

    Alternatives: foremost, hachoir-subfile

  • strings

    Extract strings from a file.

  • grep

    Search for a string, or regex, in a file.

    grep <string> <file>          # Search in a file
    grep -r <string> <directory>  # Search recursively in a directory
  • hexdump

    Display the hexadecimal representation of a file.

    hexdump -C <file>  # Dump bytes with address and ascii representation
    hexdump <file>     # Dump bytes with address only
    xxd -p <file>      # Dump only bytes
  • yara - Website

    Scan a file with Yara rules to find (malicious) patterns. rules can be found in the Yara-Rules repository.

    Here is an exemple rule to find a PNG file in a file:

    png.yar

    rule is_png {
        strings:
            $png = { 89 50 4E 47 0D 0A 1A 0A }
        condition:
            $png
    }
    yara png.yar <file>  # Scan a file, outputs rule name if match
    yara -s png.yar <file>  # Print the offset and the matched strings

File signatures

  • file signatures - Wikipedia

    File signatures are bytes at the beginning of a file that identify the file type. This header is also called magic numbers.

    Most files can be found here, but the most common ones are :

    Hex signatureFile typeDescription
    FF D8 FF (???)JPEGJPEG image
    89 50 4E 47 0D 0A 1A 0A (?PNG)PNGPNG image
    50 4B (PK)ZIPZIP archive

    For exemple, the first 16 bytes of PNG are usually b’\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR'

    This data can be outputed to a file with

    echo -n -e "\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR" > png.sig