To get the command line arguments in x86_64 on Mac OS X, I can do the following:
_main:
sub rsp, 8 ; 16 bit stack alignment
mov rax, 0
mov rdi, format
mov rsi, [rsp + 32]
call _printf
Where format is "%s". rsi gets set to argv[0].
So, from this, I drew out what (I think) the stack looks like initially:
top of stack
<- rsp after alignment
return address <- rsp at beginning (aligned rsp + 8)
[something] <- rsp + 16
argc <- rsp + 24
argv[0] <- rsp + 32
argv[1] <- rsp + 40
... ...
bottom of stack
And so on. Sorry if that's hard to read. I'm wondering what [something] is. After a few tests, I find that it is usually just 0. However, occasionally, it is some (seemingly) random number.
Also, could you tell me if the rest of my stack drawing is correct?
You have it close.
argv
is an array pointer, not where the array is. In C
it is written char **argv
, so you have to do two levels of dereferencing to get to the strings.
top of stack
<- rsp after alignment
return address <- rsp at beginning (aligned rsp + 8)
[something] <- rsp + 16
argc <- rsp + 24
argv <- rsp + 32
envp <- rsp + 40 (in most Unix-compatible systems, the environment
... ... string array, char **envp)
bottom of stack
...
somewhere else:
argv[0] <- argv+0: address of first parameter (program path or name)
argv[1] <- argv+8: address of second parameter (first command line argument)
argv[2] <- argv+16: address of third parameter (second command line argument)
...
argv[argc] <- argv+argc*8: NULL
argc
and argv
aren't on the stack, x86-64 System V passes the first up-to-6 integer/pointer args in registers. The OP loading [rsp+32]
just happened to work if the stuff that called main
didn't adjust the stack much since process entry (where the argv[]
array itself is on the stack, so _start
will do something like mov edi, [rsp]
(argc) and lea rsi, [rsp+8]
(argv) to get the args for main
. (And another LEA with RDI*8 to get envp) - Peter Cordes 2019-01-31 03:45
According to the AMD64 ABI (3.2.3, Parameter Passing), the parameters for main(int argc, char **argv)
are passed to (in left-to-right order) rdi
& rsi
because they are of INTEGER class. envp
, if it were used, would be passed into rdx
, and so forth.
gcc
places them into the current frame as follows (presumably for convenience? freeing up registers?):
mov DWORD PTR [rbp-0x4], edi
mov QWORD PTR [rbp-0x10], rsi
When the frame pointer is omitted, the addressing is relative to rsp
. Normally, argv
would be one eightbyte below rbp
(argc
comes first, although that's not mandated) and therefore:
# after prologue
mov rax, QWORD PTR [rbp-0x10] # or you could grab it from rsi, etc.
add rax, 0x8
mov rsi, QWORD PTR [rax]
mov edi, 0x40064c # format
call 400418 <printf@plt>