Document files and PDFs as a method of inserting exploits are one of the common features across publicly-disclosed targeted attacks, with the non-targeted incidents generally involving links to web pages hosting malicious code. Here I’m focussing on what appears the very first stage of a typical targeted attack (after the recce and intel gathering).
An OS must be capable of identifying different file types, in order to know which application to process it – when the user clicks on a .doc file, the OS just knows to open it with LibreOffice or Microsoft Word. What the user then has is a running application and a file loaded into memory – the process either linking to the file’s data structure or loading it into its address space.
The other concept to understand is the role of file extensions. We give a Word document the .doc extension purely to identify it as such, and so the desktop GUI gives it the appropriate icon. Remove the file extension and the OS would identify its true file type. This is one feature that enables a baddy to trick a human user, and usually it works because most of us interact with computers through a GUI.
So how does the OS know what a file actually is, without an extension?
Headers and File Internals
File types are actually defined by an initial sequence of bytes, which are sometimes referred to as the ‘file header’ or ‘magic number’. They can be seen by running the following command on a given file:
$hexdump -n 50 (filename)
Here are a couple of examples for PDFs and JPGs:
In fact, this is how digital forensic software can determine whether images are being hidden using a false extension. In our case, the technique can be used to find whether malicious code is masquerading as a document or image.
With executables, the first 125 byte seem to be an identifier, as a consequence of having a standard data structure and multiple headers.
When the file is opened, the OS determines which program/application should handle it by reading the first several bytes, and then initialises a process for that.
A Little Experiment
Let’s go beyond the forensics and see how the theory could be put ino practice by doing a little magic trick – turning cmd.exe into a ‘PDF’.
The first thing to determine is what the file header is for a valid PDF, by opening two separate documents and isolating the initial bytes that are common to both files – any file with those bytes must be a PDF, right? I did this earlier in the command line, but GHex gives a different output for some reason.
Next step is to open cmd.exe in GHex, prefix its contents with the PDF header bytes and save it as ‘testploit‘ (without an extension).
And there we go: our edited cmd.exe is now disguised as a PDF even on closer inspection. The file effectively should become a launcher for a Windows command prompt. It even passes itself off as a valid document when viewed in the properties window or scanned by VirusTotal.
There’s a much faster way, using Metasploit to create malicious PDFs complete with exploits for Adobe Reader.
What I’ve described so far is pretty amateur – a recipient would know something’s up if an actual document fails to materialise, plus it’s obvious if an .exe program is launched. In targeted attacks both the email and the attachment would be carefully tailored, to ensure that both are convincing and innocuous enough not to raise any suspicion.
It doesn’t even have to be that targeted – a fake brochure emailed to someone who attended a major marketing event (such as <InfoSecurity Europe) would work, or perhaps a 'mislaid' USB drive containing a PDF with an interesting filename, and I'm guessing that most people don't habitually update their versions of Adobe Reader. The recipient would open the doc, hit the delete button and think nothing of it, by which time the payload would have done its job.
I created a basic PDF document, then used the $strings command to view its structure:
In the Websense analysis, ‘this.(function)‘ was placed in the /OpenAction field, with ‘(function)‘ being a call to an object elsewhere in the file. I reckon both could be inserted into a PDF using a hex editor, using the same method I used for changing the file header bytes. The function could be anything – perhaps an exploit for a buffer overflow vulnerability within any of Adobe Reader’s functions, with a payload to fetch a malware installer.
The exploit creators went a couple of steps further, encoding the function and compressing it with zblib, but they still needed to reference it in the /OpenAction field.
Of course, a policy of ‘don’t click shit!’ is always the first countermeasure that comes to mind, but if a hundred employees of a given organisation were sent a malicious attachment, it’s guaranteed that several of them will open it. Only one successful attempt is needed. I’d also argue that anyone could be made to open a malware-infected document if enough effort went into crafting the attack.
A security plan must take into account that people will open whatever attachments are mailed to them. Security then relies on: 1) Patching and exploit prevention, 2) Malware detection, 3) Preventing traffic between malware and a C&C server, 4) Detection and incident response.
Windows 7 and 8 users are in a relatively good position, as Microsoft works on the assumption that code vulnerabilities will always slip through the net, and decided to mitigate them with things like like ASLR and SafeSEH. There are ways around these, but they present an obstacle to getting an exploit to run. Patching Adobe Reader should also be effective, depending on whether the attackers are limited to stock exploits.
The Hong Kong CERT have recomended the use of alternative applications for reading PDFs and Microsoft Office documents, the idea being that users would be unaffected by exploits for Adobe/Microsoft. While it’s a good strategy in the short term, it’s more of a delaying tactic against an APT, and alternative applications would become vectors should they become popular.