This cannot be. If you return floating point data or an entire structure, how can that be fitted inside EAX ?
And also I have been tought that returning a struct is bad. Is this really so ? Lets find out.
First we'll write some simple C code and see how the compiler implements it:
int return_int(int a, int b) {
return a + b;
}
float return_float(float a, float b) {
return a + b;
}
double return_double(double a, double b) {
return a + b;
}
long long return_longlong(long long a, long long b) {
return a + b;
}
struct my_struct {
int a;
int b;
int c;
int d;
int e;
};
struct my_struct return_my_struct(int a, int b) {
struct my_struct s;
s.a = a;
s.b = b;
s.c = a + b;
s.d = a * b;
s.e = 7;
return s;
}
int main(int argc, char const *argv[]) {
return 0;
}
Alright. Lets take a look at the return_int function:
(gdb) disassemble return_int
Dump of assembler code for function return_int:
0x00000000004004b4 <+0>: push rbp
0x00000000004004b5 <+1>: mov rbp,rsp
0x00000000004004b8 <+4>: mov DWORD PTR [rbp-0x4],edi
0x00000000004004bb <+7>: mov DWORD PTR [rbp-0x8],esi
0x00000000004004be <+10>: mov eax,DWORD PTR [rbp-0x8]
0x00000000004004c1 <+13>: mov edx,DWORD PTR [rbp-0x4]
0x00000000004004c4 <+16>: add eax,edx
0x00000000004004c6 <+18>: pop rbp
0x00000000004004c7 <+19>: ret
End of assembler dump.
(gdb)
The int datatype fits nicely into eax so this is used here. What about float ?
(gdb) disassemble return_float
Dump of assembler code for function return_float:
0x00000000004004c8 <+0>: push rbp
0x00000000004004c9 <+1>: mov rbp,rsp
0x00000000004004cc <+4>: movss DWORD PTR [rbp-0x4],xmm0
0x00000000004004d1 <+9>: movss DWORD PTR [rbp-0x8],xmm1
0x00000000004004d6 <+14>: movss xmm0,DWORD PTR [rbp-0x4]
0x00000000004004db <+19>: addss xmm0,DWORD PTR [rbp-0x8]
0x00000000004004e0 <+24>: pop rbp
0x00000000004004e1 <+25>: ret
End of assembler dump.
(gdb)
This time xmm0 is used, and so is it for double:
(gdb) disassemble return_double
Dump of assembler code for function return_double:
0x00000000004004e2 <+0>: push rbp
0x00000000004004e3 <+1>: mov rbp,rsp
0x00000000004004e6 <+4>: movsd QWORD PTR [rbp-0x8],xmm0
0x00000000004004eb <+9>: movsd QWORD PTR [rbp-0x10],xmm1
0x00000000004004f0 <+14>: movsd xmm0,QWORD PTR [rbp-0x8]
0x00000000004004f5 <+19>: addsd xmm0,QWORD PTR [rbp-0x10]
0x00000000004004fa <+24>: pop rbp
0x00000000004004fb <+25>: ret
End of assembler dump.
(gdb)
However, as you may have noticed, the instructions are different. For the float datatype movss and addss are used while movsd and addsd are used for double.
The 's' means "single precision floating point" while the 'd' means "double precision floating point"...that's the entire difference.
Now, the long long is double the size of an int so a bigger register is needed:
(gdb) disassemble return_longlong
Dump of assembler code for function return_longlong:
0x00000000004004fc <+0>: push rbp
0x00000000004004fd <+1>: mov rbp,rsp
0x0000000000400500 <+4>: mov QWORD PTR [rbp-0x8],rdi
0x0000000000400504 <+8>: mov QWORD PTR [rbp-0x10],rsi
0x0000000000400508 <+12>: mov rax,QWORD PTR [rbp-0x10]
0x000000000040050c <+16>: mov rdx,QWORD PTR [rbp-0x8]
0x0000000000400510 <+20>: add rax,rdx
0x0000000000400513 <+23>: pop rbp
0x0000000000400514 <+24>: ret
End of assembler dump.
(gdb)
And therefore rax is used since this is on a 64 bit machine. What would the same C code be compiled into on a 32 bit processor ? I tried and got this:
(gdb) disassemble return_longlong
Dump of assembler code for function return_longlong:
0x080483b4 <+0>: push ebp
0x080483b5 <+1>: mov ebp,esp
0x080483b7 <+3>: push ebx
0x080483b8 <+4>: sub esp,0x14
0x080483bb <+7>: mov eax,DWORD PTR [ebp+0x8]
0x080483be <+10>: mov DWORD PTR [ebp-0x10],eax
0x080483c1 <+13>: mov eax,DWORD PTR [ebp+0xc]
0x080483c4 <+16>: mov DWORD PTR [ebp-0xc],eax
0x080483c7 <+19>: mov eax,DWORD PTR [ebp+0x10]
0x080483ca <+22>: mov DWORD PTR [ebp-0x18],eax
0x080483cd <+25>: mov eax,DWORD PTR [ebp+0x14]
0x080483d0 <+28>: mov DWORD PTR [ebp-0x14],eax
0x080483d3 <+31>: mov eax,DWORD PTR [ebp-0x18]
0x080483d6 <+34>: mov edx,DWORD PTR [ebp-0x14]
0x080483d9 <+37>: mov ecx,DWORD PTR [ebp-0x10]
0x080483dc <+40>: mov ebx,DWORD PTR [ebp-0xc]
0x080483df <+43>: add eax,ecx
0x080483e1 <+45>: adc edx,ebx
0x080483e3 <+47>: add esp,0x14
0x080483e6 <+50>: pop ebx
0x080483e7 <+51>: pop ebp
0x080483e8 <+52>: ret
End of assembler dump.
(gdb)
Quite a bit more code. Tricks are used since long long is 64 bits wide and the 32 bit machine cannot hold this in one register. Instead two registers are used to represent one number. In this case we see that eax is used for containing the bottom half and edx contains the top half.
Now, the struct return is interesting:
(gdb) disassemble return_my_struct
Dump of assembler code for function return_my_struct:
0x0000000000400515 <+0>: push rbp
0x0000000000400516 <+1>: mov rbp,rsp
0x0000000000400519 <+4>: mov QWORD PTR [rbp-0x28],rdi
0x000000000040051d <+8>: mov DWORD PTR [rbp-0x2c],esi
0x0000000000400520 <+11>: mov DWORD PTR [rbp-0x30],edx
0x0000000000400523 <+14>: mov eax,DWORD PTR [rbp-0x2c]
0x0000000000400526 <+17>: mov DWORD PTR [rbp-0x20],eax
0x0000000000400529 <+20>: mov eax,DWORD PTR [rbp-0x30]
0x000000000040052c <+23>: mov DWORD PTR [rbp-0x1c],eax
0x000000000040052f <+26>: mov eax,DWORD PTR [rbp-0x30]
0x0000000000400532 <+29>: mov edx,DWORD PTR [rbp-0x2c]
0x0000000000400535 <+32>: add eax,edx
0x0000000000400537 <+34>: mov DWORD PTR [rbp-0x18],eax
0x000000000040053a <+37>: mov eax,DWORD PTR [rbp-0x2c]
0x000000000040053d <+40>: imul eax,DWORD PTR [rbp-0x30]
0x0000000000400541 <+44>: mov DWORD PTR [rbp-0x14],eax
0x0000000000400544 <+47>: mov DWORD PTR [rbp-0x10],0x7
0x000000000040054b <+54>: mov rax,QWORD PTR [rbp-0x28]
0x000000000040054f <+58>: mov rdx,QWORD PTR [rbp-0x20]
0x0000000000400553 <+62>: mov QWORD PTR [rax],rdx
0x0000000000400556 <+65>: mov rdx,QWORD PTR [rbp-0x18]
0x000000000040055a <+69>: mov QWORD PTR [rax+0x8],rdx
0x000000000040055e <+73>: mov edx,DWORD PTR [rbp-0x10]
0x0000000000400561 <+76>: mov DWORD PTR [rax+0x10],edx
0x0000000000400564 <+79>: mov rax,QWORD PTR [rbp-0x28]
0x0000000000400568 <+83>: pop rbp
0x0000000000400569 <+84>: ret
End of assembler dump.
(gdb)
Here we se that the structure is built on the functions stack frame and then (from address 0x40054b) the local structure is copied out to an address that the caller specifies. This is quite bad, but what happens if we ask the compiler to optimize (-O3) ?
Then we get this:
(gdb) disassemble return_my_struct
Dump of assembler code for function return_my_struct:
0x0000000000400500 <+0>: lea ecx,[rsi+rdx*1]
0x0000000000400503 <+3>: mov DWORD PTR [rdi],esi
0x0000000000400505 <+5>: mov rax,rdi
0x0000000000400508 <+8>: imul esi,edx
0x000000000040050b <+11>: mov DWORD PTR [rdi+0x4],edx
0x000000000040050e <+14>: mov DWORD PTR [rdi+0x10],0x7
0x0000000000400515 <+21>: mov DWORD PTR [rdi+0x8],ecx
0x0000000000400518 <+24>: mov DWORD PTR [rdi+0xc],esi
0x000000000040051b <+27>: ret
End of assembler dump.
(gdb)
This is much shorter and the address of the destination structure is used locally. This is much better and sort of the same as if the function had taken a pointer to the destination structure explicitly in the C source. Maybe we should not be so worried about letting the compiler do these kinds of optimizations for us...if in doubt, disassemble the result and take a look.