This tool will parse a PDF document to distinguish the central components utilized as a part of analyzed file. It won’t render a PDF archive.
- Load/parse objects and headers
- Extract meta data (author, description, …)
- Extract text from ordered pages
- Support of compressed pdf
- Support of MAC OS Roman charset encoding
- Handling of hexa and octal encoding in text sections
- PSR-0 compliant (autoloader)
- PSR-1 compliant (code styling)
Analyzing a Malicious PDF File
We have created the PDF file with an EXE file embedded with it.
Step 1: To launch the PDF parser type pdf-parser
[email protected]:~# pdf-parser -h List all the options with PDFParser
Step2: To get the stats of the PDF Document.
Step3: Passing stream data through Filters FlateDecode,ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode.
Step4: To get the Hashes of the PDF file.
Step5: Case sensitive search in streams
The stats option show insights of the items found in the PDF report. Utilize this to recognize PDF archives with unusual/unexpected objects, or to characterize PDF records.
The search option scans for a string in indirect objects (not inside the surge of Indirect objects). The inquiry is not case-sensitive and is defenseless to obfuscation methods.
Filter option applies the filter(s) to the stream, whereas raw option makes pdf-parser output raw data.