X86 stack frame understand

2733阅读 0评论2012-05-11 android_bsp

There is a link as following:

There are some register to understand as below:


一、ESP - Stack Pointer

This 32-bit register is implicitly manipulated by several CPU instructions (PUSH, POP, CALL, and RET among others), it always points to the last element used on the stack (not the first free element): this means that the PUSH and POP operations would be specified in pseudo-C as:

*--ESP = value;   // push


value = *ESP++;   // pop

The "Top of the stack" is an occupied location, not a free one, and is at the lowest memory address.


Esp是一个32位的寄存器,他被几个cpu指令隐示操作,分别是 pushpopcall ret etc esp总是指向栈上的最新的元素,可以用下面这样来表示esp这种特性:

*--esp  =  value 这相当入栈push操作, 首先是esp-1 因为对x86的栈是向下增长,esp-1相当于向下增长了四个字节(一个内存单元)的空间,然后将value赋值给esp所指向的内存单元。

Value  = *esp++ 这相当出栈(pop)操作,先将esp对应的内存单元的值赋值给value 然后是esp+1 也就是栈指针减小一个内存单元。



EBP - Base Pointer

This 32-bit register is used to reference all the function parameters and local variables in the current stack frame. Unlike the %esp register, the base pointer is manipulated only explicitly. This is sometimes called the "Frame Pointer".

Ebp是一个32位的寄存器,被用于去引用所有的函数参数和局部变量在当前的栈帧中, ebp有时也被称为帧指针。


EIP - Instruction Pointer

This holds the address of the next CPU instruction to be executed, and it's saved onto the stack the CALL instruction. As well, any of the "jump" instructions modify the %EIP directly.


二、下面是是介绍AT&T 汇编和intel汇编的差异

sembler notation

Virtually everybody in the Intel assembler world uses the Intel notation, but the GNU C compiler uses what they call the "AT&T syntax" for backwards compatibility. This seems to us to be a really dumb idea, but it's a fact of life.

There are minor notational differences between the two notations, but by far the most annoying is that the AT&T syntax reverses the source and destination operands. To move the immediate value 4 into the EAX register:

mov $4, %eax          // AT&T notation


mov eax, 4            // Intel notation

More recent GNU compilers have a way to generate the Intel syntax, but it's not clear if the GNU assembler takes it. In any case, we'll use the Intel notation exclusively.

There are other minor differences that are not of much concern to the reverse engineer.




1Push parameters onto the stack, from right to left

Parameters are pushed onto the stack, one at a time, from right to left. Whether the parameters are evaluated from right to left is a different matter, and in any case this is unspecified by the language and code should never rely on this. The calling code must keep track of how many bytes of parameters have been pushed onto the stack so it can clean it up later.


2Call the function

Here, the processor pushes contents of the %EIP (instruction pointer) onto the stack, and it points to the first byte after the CALL instruction. After this finishes, the caller has lost control, and the callee is in charge. This step does not change the %ebp register.

第二步骤是保存返回地址,类似arm中的保存lr 翻译原话是,cpu push eip的内容头栈上,什么内容呢?就是在call调用返回后执行的第一条指令的地址。

3Save and update the %ebp

Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.

push ebp

mov  ebp, esp    // ebp « esp

Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp)is the old base pointer and 4(%ebp) is the old instruction pointer.


Push  ebp


Mov ebpesp



翻译的原话是,我需要一个新的帧指针指向ebp,所以保存当前的ebp(属于上一个函数的帧指针),让他指向栈顶,一旦ebp被改变, 他就能直接的引用函数的参数, ebp+8 是函数的二参数, ebp + 12 是函数的第一个参数,ebp+0实际上就是上一个函数的帧指针,ebp+4就是上一个函数的返回地址,就是当前函数的返回地址。


4Save CPU registers used for temporaries

If this function will use any CPU registers, it has to save the old values first lest it walk on data used by the calling functions. Each register to be used is pushed onto the stack one at a time, and the compiler must remember what it did so it can unwind it later.


5Allocate local variables

The function may choose to use local stack-based variables, and they are allocated here simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks.

Now, the local variables are located on the stack between the %ebp and %esp registers, and though it would be possible to refer to them as offsets from either one, by convention the %ebp register is used. This means that -4(%ebp) refers to the first local variable.




Perform the function's purpose

At this point, the stack frame is set up correctly, and this is represented by the diagram to the right. All the parameters and locals are offsets from the %ebp register:


- third function parameter


- second function parameter


- first function parameter


- old %EIP (the function's "return address")


- old %EBP (previous function's base pointer)


- first local variable


- second local variable


- third local variable


注意体会上面的话,我还是有些confusion 需要进一步搜索些资料


下一篇:一个panic bug的分析过程(一)