For this I used just two utilities that should be present on a typical Linux system: the Gnu Debugger and objdump. The following commands are quick methods of getting straight into disassembling the executable sections of an ELF file:
(gdb) disas /m main
$objdump -d [file name]
The first disassembles the main() function of whatever program was loaded into GDB. The $objdump -d command is useful for analysing shared objects, as its output enables us to look at what’s inside each function.
Why disassemble an executable into something that’s much harder to understand than decompiled code? That’s a very interesting question. I suppose it’s because the bytecode of an executable is directly translatable to assembler instructions, with less chance of error. A decompiler would have to go a step further and accurately group assembler instructions to recreate higher-level code, which is a task better performed with human expertise.
At first the following appears to be a very simple program that prints the string ‘Hello World’:
When disassembling the compiled program’s main() function in GDB, we get the following output:
The first three instructions are what some refer to as a ‘prologue’, which sets up the stack. This initialises the stack pointer by pushing the base pointer address (EBP) onto the stack, then setting that as the current stack pointer address (ESP), so the stack pointer should increment from that address as data is pushed onto the stack. The third instruction forces the stack pointer address to become a multiple of 16 by ANDing the bits with 0xFFFFFFF0. Next, subtract 16 bytes (0x10) from the stack pointer address.
With this section of code it becomes possible to identify the beginning of a function in a disassembled executable.
Offsets that are positive or negative to the base pointer address should indicate whether data is being pushed onto the stack or the heap.
The following instruction fetches the contents at address 0x80484D0 (in the .rodata section of the executable), pushes it onto the stack and updates the stack pointer:
movl $0x80484D0, (%esp)
The data fetched from address 0x80484D0 is the ‘Hello World’ string, and this happens immediately before printf() is called.
In the past I’ve discussed function calls as direct references to something in a linked object such as a DLL or .so. file. Obviously it doesn’t happen exactly like that here, as ” refers to an entry in a ‘Program Linkage Table’. This exists in the .plt section of the executable at address 0x80482F0. I’m assuming the value at this address is the location of the actual printf() function in the linked object.
I’m also assuming, therefore, that the printf(), when called, automatically fetches and handles whatever’s at the top of the stack.
Mapping the Program
When all sections of the executable are disassembled using objdump, other stuff that was statically linked when the program was compiled are listed, and this is where the workings of our simple ‘Hello World’ program become far more complex.
Even if we don’t fully understand the assembler code, this is enough to identify the start of each function (the prologue), the functions they’re calling and the parameters being passed between them. We’re also able to identify addresses and offsets among the many hexadecimal values in the code. This should enable us to develop a map of the program and get a fairly detailed view of what it does.
By looking at the map, we can see that address range 0x80484D0 to 0x804A020 is important. Why? Looking at the output of objdump, these addresses are pointers to other sections of the executable, such as .rodata, the Global Offset Table and .bss.
The other functions being called exist in the libc module (/lib/i386-linux-gnu/libc-2.19.so), and this can also be disassembled to see the functions themselves.
This isn’t much for an isolated executable that uses native libraries, but it’s very useful when examining an executable that could be malware.
Working with Variables
After compiling and disassembling the following code, we should see function calls for printf() and scanf(), and some jump instructions for the branching statements. We also expect to find the variable mynum being compared to a static variable (100):
There are a couple of noteworthy things about the way the program works. First of all, fixed data isn’t pushed onto the stack directly. They are again stored and read from another part of the executable. Secondly, the user-defined variable is compared with a value built into the code (0x64, or 100).
Where is the data stored during the execution of this program? After the prologue there are two addresses of interest: 0x80485A0 and 0x8048350. The first address refers to data in the .rodata section of the executable, specifically the ‘Enter number:’ string. The second address is a reference to printf().
What’s happening in this part of the code is the program is moving the string from an address in .rodata to the stack, then calling printf().
Something similar happens slightly further down the code. with something at 0x80485AF being move onto the stack before scanf() is called. The data in question is string ‘You have entered a number less than 100.’.
Flow control can be identified by the jump instructions and the CMP (compare) instructions preceeding them. What is being compared is whatever’s in EAX with values 0x63 and 0x64. These are 99 and 100 in decimal.
The first of these jumps to address 0x80484F4 if the value in EAX is greater than 99. The second of these jumps to address 0x80484F4 if the contents of EAX is not equal to 100. In both these cases the program fetches string ‘You have entered a number greater than 100.’ from .rodata and prints that using the puts() function.
User input that does not match either of the above conditions will cause the program to execute until the JMP instruction, which jumps to address 0x8048500. Here 0x0 is moved into EAX (clearing the register?) and the function is exited.
Writing to File
This uses the fopen(), fwrite() and fclose() functions to stream I/O to a file.
Disassembling main() we get the following:
The function calls for fclose(), fwrite() and fopen() are in the Private Linkage Table. These entries in the PLT again refer to the functions within the libc module.
What about the strings that are passed to fopen() and fprint()? These also live in the ‘.rodata’ section of the executable, which is at address 0x80485a8 – 0x80485F8. As before, the strings to be written, inluding the file path are fetched from .rodata before the functions are called.