Previous post

Next Post

Before this series goes much more in-depth, I thought that it might be helpful to give a quick overview on how x86 assembly works. This will not go too in-depth, but will work more as a crash course for people to familiarise themselves with, or for beginners to gain an understanding as to x86 assembly works. If you already feel comfortable with this subject, feel free to skip over this article.

We’ll start off with a set of definitions and crucial information needed to understand x86 Assembly better.

 

Data types:

the three main datatypes in x86 are bits, bytes, words and double words. A bit holds a 1 or 0, a byte holds 8 bits, a word is two bytes put together being 16 bits, and a double word (dword) is two words put together, holding up to 32 bits.

Registers:

Registers are small storage spaces available to store data, they can be used to store values for future usage by the CPU. Registers can be divided into the following types:

General purpose registers.

These are used by the CPU during execution. There are 8 general purpose registers, and each was originally designed with a certain purpose in mind as the name will reflect, although it is possible to use any of these registers for any purpose, but this is not recommended. The eight general purpose registers are as follows:

EAX: Exteneded Accumulator Register

EBX: Extended Base Register

ECD: Extended Counter Register

EDX: Extended Data Register

ESI: Extended Source Index

EDI: Extended Destination Index

EBP: Extended Base Pointer

ESP: Extended Stack Pointer

Each of these registers are 32 bit in size, capable of holding one dword.

Segment Registers.

Segment registers can be used to make segmental discinctions in a program binary. For example, the hexadecimal value 0xF2 could either represent a data value, or an instruction. The CPU is able to make this distinction owing to Segment Registers.

Status Flag Registers.

Flags are small bit values (1 or 0), each flag representing a status. For example, if the “signed” flag is set, the value of FF will represent -1 rather than 255. Flags are stored in a specific flag register, where many one bit flags can be stored at once. The flags are set whenever an operation results in a certain state or output. The four most important flags are the following:

Z: zero flag, set whenever the result of the last operation is zero.

S: signed flag, set to determine whether values should be interpereted as signed or unsigned, as there is no other intrinsic way to represent negative values in binary.

O: overflow flag, set when the result of the last operation switches the most significant figure from a 0 to F, or F to 0.

C: carry flag, is set whenever the result of the last operation changes the most significant bit.

Extended Instruction Pointer - EIP

The Extended Instruction Pointer, or EIP is simply a register that stores a pointer to the next instruction that is to be executed.

Segments and offsets:

Every binary will consist of a series of different segments. There are four segments that every program must have, being .text, .data, .stack and .heap. The assembly code directly associated with executing code that has been written is placed within .text, and global data is placed within .data. The stack, among other things, is where function arguments and local variables are stored, and the heap is extendable memory that programs may use whenever they need more space for memory.

The stack.

The stack is an area of memory where programs may store local variables, function arguments and other similar callable data for later use. The stack is organised as a “last in, first out” data structure. When something is added to the stack, it is added on the top, and when something is removed, it is removed from the top. Data stored in the stack is accessed from memory addresses that start at a high address, addresses decreasing with new information being ‘pushed’ onto it.

Stack frames.

Every process has at least one thread to itself, and every thread has its own stack. Within the stack of every thread, each thread has its own stack frame. The base is the beginning of a stack frame, with the base pointer EBP pointing to the beginning of a function’s stack frame. Register ESP points to the top of the stack.

The heap.

The heap is a space in memory that can be allocated to any process when it is in need of more. Each process has one heap and it is shared among the different threads. All threads share the same heap. The data-structure of the heap is a linked list, which means that each item in the heap only knows the position of the immediate items before and after it. When a process no longer needs memory, the standard is to free the allocated heap, de-referencing the no longer required portion.

 

Instructions:

x86 instructions using Intel syntax will vary in size between 1 and 14 bytes. Most instructions will have two operators (E.g, add eax, ebx), some will have one (not eax) or occasionally 3 (imul eax, edx, 64).

Arithmetic operations.

Lets start off with some simple arithmetic instructions.

add add dest, src

Destination and source can either be a register such as eax, a memory reference ([esp]), or an immediate number. The destination and source cannot be memory references at the same time, they can both be registers however.

Example:

add eax, ebx     ;both dest and src are registers
add [esp], eax   ;source is eax register, esp is a memory reference, by its square brackets
add eax, [esp]   ;same as above, but reversed
add eax, 4       ;src is an immediate value
sub sub dest, src

subtraction works the same as add. Division and multiplication work differently however.

div div divisor

In the case of div, the dividend is always the register eax, which is also where the result of division is stored. The rest value is stored in edx.

Example:

mov eax, 65     ;copy the dividend into eax register
mov ecx, 4      ;copy divisor into register ecx
div ecx         ;divide eax by ecx

These instructions would result in eax containing 16, and edx containing the remainder, being 1.

The instruction IDIV works similar, but rather with signed division.

mul mul value/mul dest, value, value/mul dest, value or imul

mul/imul (signed/unsigned) will either multiply eax with a value, multiply two values and place the result into a destination register, or will multiply a destination register with a value.

Bitwise operations - AND, OR, XOR, NOT
and and dest, src
or or dest, src
xor xor dest, src
not not eax

With Bitwise Operators, two pieces of data are being compared by each bit, and depending on the operation, the outcome will be either a 1 or 0.

Take for example the following. If the first value is equal to 10011011 and the second value is equal to 11001001, the result of operation AND on the data would be 10001001, as only the bits in position 1, 5 and 8 are both a 1. Other operations behave according to their rules similar to AND as you would expect. Check the Wikipedia page on bitwise operations in C for a great overview of bitwise operators.

bit a bit b a AND b
0 0 0
0 1 0
1 0 0
1 1 1

The operation OR would check the corresponding bits at the same location in each value, and result in a 1 in the same position if either one of, or both of the values is a 1.

bit a bit b a OR b
0 0 0
0 1 1
1 0 1
1 1 1

XOR (exclusive OR) is similar to OR, but differs in that the output is 1 when either one of the inputs is a 1, but not both.

bit a bit b a XOR b
0 0 0
0 1 1
1 0 1
1 1 0

NOT dissimilar to other bitwise operations, as it only takes one value and returns the inverse of every bit. For example, 1101110 when applied with NOT would result in 0010001.

Branching:

In assembly, branching is achived through the use of jumps and flags. A jump is an instruction that can point the instruction pointer (EIP) to go to another portion of code, similarly to the goto keyword in C. As mentioned previously, flags are one bit values, which are set by instructions. Different instructions will have the ability to manipulate flags in different ways.

add has the ability to set each of the Z, S, O and C flags among others, and the same can be said about the sub instruction. However, the and instruction will always clear the O and C flags, but will set the Z and S flags according to the result. Depending on which flags are set, a jump will happen or not. In assembly code, you will often want to make a jump depending on whether a condition is met. There are many jump instructions to choose between, one of which being jle, or “jump less or equal”. In C, this would be written as:

if(a <= b){do something}

Similarly, jbe or “jump below or equal” exists, which in C would be:

if(a <= b){do something}

Now at first, it may seem that both of these instructions do the exact same thing. However, the difference here is owing to signed and unsigned comparisons, as jle is used to check the flags after a comparison between signed variables, and jbe is for unsigned comparisons. This is just a few of many jump instructions.

 

That’s enough assembly for now. in the next post, i’ll go over some more advanced instructions, including moving data, loops and functions.


x89k

Python, C, Reverse Engineering, Security