X86 stack frame understand-android

There is a link as following:

There are some register to understand as below:

一、ESP - Stack Pointer

This 32-bit register is implicitly manipulated by several CPU instructions (PUSH, POP, CALL, and RET among others), it always points to the last element used on the stack (not the first free element): this means that the PUSH and POP operations would be specified in pseudo-C as:

*--ESP = value; // push

value = *ESP++; // pop

The "Top of the stack" is an occupied location, not a free one, and is at the lowest memory address.

中文解释如下：

Esp是一个32位的寄存器，他被几个cpu指令隐示操作，分别是 push，pop，call， ret etc， esp总是指向栈上的最新的元素，可以用下面这样来表示esp这种特性：

*--esp = value；这相当入栈push操作，首先是esp-1，因为对x86的栈是向下增长，esp-1相当于向下增长了四个字节（一个内存单元）的空间，然后将value赋值给esp所指向的内存单元。

Value = *esp++；这相当出栈（pop）操作，先将esp对应的内存单元的值赋值给value，然后是esp+1，也就是栈指针减小一个内存单元。

另外注意的是esp所指向的内存单元的值始终是有效的，类似于arm里面的满栈，始终指向最低的内存单元地址，我的理解是，因为栈是向下增长，所以栈指针始终是指向最低的位置的。

EBP - Base Pointer

This 32-bit register is used to reference all the function parameters and local variables in the current stack frame. Unlike the %esp register, the base pointer is manipulated only explicitly. This is sometimes called the "Frame Pointer".

Ebp是一个32位的寄存器，被用于去引用所有的函数参数和局部变量在当前的栈帧中， ebp有时也被称为帧指针。

EIP - Instruction Pointer

This holds the address of the next CPU instruction to be executed, and it's saved onto the stack the CALL instruction. As well, any of the "jump" instructions modify the %EIP directly.

Eip寄存器保存了cpu下一条即将执行的指令的地址，也即是arm中的pc，假设调用call指令，eip会保存到栈上面，类似于arm中的lr寄存器，这里对于返回地址，x86架构没有专门用一个寄存器来表示，同时，任何跳转指令也会就该eip的值。

二、下面是是介绍AT&T 汇编和intel汇编的差异

sembler notation

Virtually everybody in the Intel assembler world uses the Intel notation, but the GNU C compiler uses what they call the "AT&T syntax" for backwards compatibility. This seems to us to be a really dumb idea, but it's a fact of life.

There are minor notational differences between the two notations, but by far the most annoying is that the AT&T syntax reverses the source and destination operands. To move the immediate value 4 into the EAX register:

mov $4, %eax // AT&T notation

mov eax, 4 // Intel notation

More recent GNU compilers have a way to generate the Intel syntax, but it's not clear if the GNU assembler takes it. In any case, we'll use the Intel notation exclusively.

There are other minor differences that are not of much concern to the reverse engineer.

三、下面介绍栈帧的组成是怎么会事儿：

1、Push parameters onto the stack, from right to left

Parameters are pushed onto the stack, one at a time, from right to left. Whether the parameters are evaluated from right to left is a different matter, and in any case this is unspecified by the language and code should never rely on this. The calling code must keep track of how many bytes of parameters have been pushed onto the stack so it can clean it up later.

第一步骤是入栈所有的参数，参数入栈的顺序是从右到左

2、Call the function

Here, the processor pushes contents of the %EIP (instruction pointer) onto the stack, and it points to the first byte after the CALL instruction. After this finishes, the caller has lost control, and the callee is in charge. This step does not change the %ebp register.

第二步骤是保存返回地址，类似arm中的保存lr，翻译原话是，cpu push eip的内容头栈上，什么内容呢？就是在call调用返回后执行的第一条指令的地址。

3、Save and update the %ebp

Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.

push ebp

mov ebp, esp // ebp « esp

Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp)is the old base pointer and 4(%ebp) is the old instruction pointer.

到这里，有了上面的call调用，我们实际上已经在新的函数里面了，我需要一个新的栈帧指针ebp指向新的函数的栈底，但是此时的ebp还是上一个函数的帧指针，所以这里有这样一步：

Push ebp

就是将ebp入栈，也就是将上一个函数的的帧指针入栈，然后调用

Mov ebp，esp

将紧接上个函数调用之后的栈指针esp赋值给ebp作为新函数的帧指针，也就是新函数的栈底指针。

翻译的原话是，我需要一个新的帧指针指向ebp，所以保存当前的ebp（属于上一个函数的帧指针），让他指向栈顶，一旦ebp被改变，他就能直接的引用函数的参数， ebp+8 是函数的二参数， ebp + 12 是函数的第一个参数，ebp+0实际上就是上一个函数的帧指针，ebp+4就是上一个函数的返回地址，就是当前函数的返回地址。

4、Save CPU registers used for temporaries

If this function will use any CPU registers, it has to save the old values first lest it walk on data used by the calling functions. Each register to be used is pushed onto the stack one at a time, and the compiler must remember what it did so it can unwind it later.

这一步是保存在本函数中可能会用到的寄存器，跟arm中的现场保护差不多。

5、Allocate local variables

The function may choose to use local stack-based variables, and they are allocated here simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks.

Now, the local variables are located on the stack between the %ebp and %esp registers, and though it would be possible to refer to them as offsets from either one, by convention the %ebp register is used. This means that -4(%ebp) refers to the first local variable.

为局部变量申请空间。

Perform the function's purpose

At this point, the stack frame is set up correctly, and this is represented by the diagram to the right. All the parameters and locals are offsets from the %ebp register:

16(%ebp)	- third function parameter
12(%ebp)	- second function parameter
8(%ebp)	- first function parameter
4(%ebp)	- old %EIP (the function's "return address")
0(%ebp)	- old %EBP (previous function's base pointer)
-4(%ebp)	- first local variable
-8(%ebp)	- second local variable
-12(%ebp)	- third local variable

注意体会上面的话，我还是有些confusion，需要进一步搜索些资料