Monday, August 5, 2013

EBCTF - Binary400 writeup

The challenge was this:
Please upload 5 Windows console executable files with the same MD5 but with different printed outputs (file type: MS Windows, PE32 executable, console)

The output for the files should be:
File1: All Eindbazen are wearing wooden shoes
File2: All Eindbazen live in a windmill
File3: All Eindbazen grow their own tulips
File4: All Eindbazen smoke weed all day
File5: All Eindbazen are cheap bastards
I know of two ways of achieving this but the first one is probably considered cheating:

$ ./file1
All Eindbazen are wearing wooden shoes
$ ./file2
All Eindbazen live in a windmill
$ ./file3
All Eindbazen grow their own tulips
$ ./file4
All Eindbazen smoke weed all day
$ ./file5
All Eindbazen are cheap bastards
$ md5sum file*
291741d9344f70d5b33c52c6e7b041c5  file1
291741d9344f70d5b33c52c6e7b041c5  file2
291741d9344f70d5b33c52c6e7b041c5  file3
291741d9344f70d5b33c52c6e7b041c5  file4
291741d9344f70d5b33c52c6e7b041c5  file5
$ shasum file*
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file1
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file2
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file3
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file4
8f69fb6689c44f09422f5c033c97e8978b87c1a2  file5
Nice huh. But take a look at the code...you'll probably see why this is cheating.

#include <string.h>
#include <stdio.h>
#include <libgen.h>

int main(int argc, char const *argv[]) {
    printf("%s\n",
        strcmp(basename((char*)argv[0]), "file1") == 0 ? "All Eindbazen are wearing wooden shoes" :
        strcmp(basename((char*)argv[0]), "file2") == 0 ? "All Eindbazen live in a windmill" :
        strcmp(basename((char*)argv[0]), "file3") == 0 ? "All Eindbazen grow their own tulips" :
        strcmp(basename((char*)argv[0]), "file4") == 0 ? "All Eindbazen smoke weed all day" :
        strcmp(basename((char*)argv[0]), "file5") == 0 ? "All Eindbazen are cheap bastards" :
                                        "Nope"
    );
    return 0;
}
But fortunately I also have a more correct solution. I am not a cryptologist so the don't expect any details on how to generate md5 collisions, but I will tell you how I got to a solution.

First, MD5 is a state machine. It has an initial state, and for each block of data this state changes. Each time MD5 has a specific state and is given the same block of data, its state will change to the same state.

If two different files have the same MD5 sum you can append the same data to each of then and they will still share the same MD5 sum.

A collision occurs when two different blocks will yield the same state and apparently finding collisions is not that hard. But what can we use that for?

We can make a program something like this:


A1 = "AAAAAAAAAAA"
A2 = "AAAAAAAAAAA"
B1 = "BBBBBBBBBBB"
B2 = "BBBBBBBBBBB"
C1 = "CCCCCCCCCCC"
C2 = "CCCCCCCCCCC"

if (A1 == A2 && B1 == B2 && C1 == C2) {
    print "All Eindbazen are wearing wooden shoes"
} else if (A1 != A2 && B1 == B2 && C1 == C2) {
    print "All Eindbazen live in a windmill"
} else if (A1 == A2 && B1 != B2 && C1 == C2) {
    print "All Eindbazen grow their own tulips"
} else if (A1 != A2 && B1 != B2 && C1 == C2) {
    print "All Eindbazen smoke weed all day"
} else if (A1 == A2 && B1 == B2 && C1 != C2) {
    print "All Eindbazen are cheap bastards"
}
Now, copy this program into five different files and generate three pairs of collisions. In the first program substitute both A1 and A2 with the same block in collision one, B1 and B2 with the same block in collision two, and C1 and C2 with the same block in collision three.
In the second program substitute A1 with the one block of collision one and A2 with the second block of collision one. For B1, B2, C1 and C2 do as in the first program. Now the first and second program will both have the same MD5 sum but exhibit different behaviors. You get the point.
Now, the trick is that the collisions only work for a specific state of the MD5 state machine, so you cannot simply generate three collisions or even use one collision for all three blocks. You have to generate the 'A' collisions first and substitute them in. Then you can generate the 'B' collisions and only then you can generate 'C' collisions.
To do this you can use the Peter Selingers program which can be downloaded from here.
The collisions can be generated faster by using Vlastimil Klimas attack. His code can be gotten from here. It will require some changes however, but you figure it out...that's my challenge to you.
The end result can be seen here:

$ file *
file1.exe: PE32 executable (console) Intel 80386, for MS Windows
file2.exe: PE32 executable (console) Intel 80386, for MS Windows
file3.exe: PE32 executable (console) Intel 80386, for MS Windows
file4.exe: PE32 executable (console) Intel 80386, for MS Windows
file5.exe: PE32 executable (console) Intel 80386, for MS Windows
$ md5sum *
d47eafe3f5ae42e0498ed7769900004c  file1.exe
d47eafe3f5ae42e0498ed7769900004c  file2.exe
d47eafe3f5ae42e0498ed7769900004c  file3.exe
d47eafe3f5ae42e0498ed7769900004c  file4.exe
d47eafe3f5ae42e0498ed7769900004c  file5.exe
$ shasum *
6cd2d31da99b59bc8b94b04d85d0868dc76e7bd0  file1.exe
245a24f647eb5d8fa6e5ff360e42b6f8f3692cfa  file2.exe
1ae718a4a1b8f9906068dd315473aa41927bfe99  file3.exe
0a1c867cf62c35d4283675599d445e29e584c94f  file4.exe
143f0838c8ed0e3800551ad66e01cb1855afa2f3  file5.exe
$ for file in *; do echo "$file: "$(wine $file); done
file1.exe: All Eindbazen are wearing wooden shoes
file2.exe: All Eindbazen live in a windmill
file3.exe: All Eindbazen grow their own tulips
file4.exe: All Eindbazen smoke weed all day
file5.exe: All Eindbazen are cheap bastards
$
You can download my files here.