x86 Registers and Memory Read/Write

type

status

date

slug

summary

Registers in x86 Architecture

The 32-bit registers in the x86 architecture can be classified into four categories: general registers, segment registers, index and pointers, and the indicator.

General Registers

General registers are the ones we use most frequently, as they perform a variety of operations in assembly language. They include 32-bit, 16-bit, and 8-bit registers, which can be broken down as follows:

32-bit registers: EAX, EBX, ECX, EDX

16-bit registers: AX, BX, CX, DX

8-bit registers: AH, AL, BH, BL, CH, CL, DH, DL (where H and L suffixes denote high byte and low byte)

Each of the 32-bit registers serves a specific purpose:

EAX, also known as the accumulator register, is often the default register for arithmetic, logic, and data manipulation operations. Some common uses for EAX include:

Performing arithmetic and logic operations like ADD, SUB, MUL, DIV, AND, OR, etc.

Storing the return value of a function call in many calling conventions, such as cdecl and stdcall.

Acting as an implicit operand in string and I/O operations, like REP and IN/OUT instructions.

EBX, or the base register, is frequently used as a base pointer for addressing memory. For example, when manipulating arrays or data structures, EBX can store the base address of the array or data structure being accessed, while another register (such as ECX or ESI) is used as an index or offset.

ECX, known as the counter register, is often employed as a loop counter for iterating over data structures like arrays or strings. For instance, in instructions like REP STOSD, ECX can store the number of iterations remaining in a loop or the index of the current element being accessed.

EDX, called the data register, is similar to EAX in that it is also commonly used for arithmetic, logic, and data manipulation operations. Some specific uses of EDX include:

Serving as an extension to EAX in 64-bit arithmetic operations, such as MUL and DIV, where the result or operands are split between EAX and EDX.

Storing the high-order 32 bits of a 64-bit result in some arithmetic operations, like signed and unsigned multiplication (IMUL, MUL) or division (IDIV, DIV).

Segment Registers

Segment registers are special-purpose registers used in the x86 architecture for memory segmentation. Memory segmentation is a technique that divides memory into segments, allowing a program to access different portions of memory. Each segment register holds the base address of a specific memory segment, which is combined with an offset to create a linear (or physical) address for accessing memory.

There are seven 16-bit segment registers in the x86 architecture:

CS (Code Segment):

This register contains the base address of the current code segment, where the program's executable instructions reside. The Instruction Pointer (IP/EIP/RIP) register holds the offset within this segment for the next instruction to be executed.

DS (Data Segment):

The Data Segment register contains the base address of the current data segment, where the program's variables and data structures are typically stored. Most data-related instructions use the DS segment register by default.

SS (Stack Segment):

The Stack Segment register contains the base address of the current stack segment, which manages the program's runtime call stack. Stack-related operations, such as push, pop, call, and ret, implicitly use the SS segment register.

ES, FS, GS:

These additional segment registers are available for pointer addressing in specific cases, such as accessing video memory or other specialized memory areas.

Index and Pointer Registers

Index and pointer registers are essential components in assembly language programming, as they hold the offset part of an address. They serve various purposes, but each register has a specific function. Sometimes, they are used with a segment register to point to a far address.

There are two types of these registers: index registers and pointer registers.

Index registers:

ESI (Extended Source Index):

ESI is often used as a source index register for string and memory block operations, such as MOVS, CMPS, LODS, and STOS. In these operations, ESI holds the address of the source data.
It can also be used as a general-purpose register for other operations, like addressing elements within an array or for general arithmetic and logic operations.

EDI (Extended Destination Index):

EDI is often used as a destination index register for string and memory block operations, such as MOVS, CMPS, and STOS. In these operations, EDI holds the address of the destination data.
Like ESI, EDI can also be used as a general-purpose register for other operations, such as addressing elements within an array or for general arithmetic and logic operations.

Pointer registers:

ESP (Extended Stack Pointer):

ESP is used as a pointer to the top of the stack in the current stack segment (as defined by the SS segment register).
It is implicitly used in stack-related operations, such as PUSH, POP, CALL, RET, and adjusting the stack frame with ENTER and LEAVE instructions.
ESP is vital for managing the call stack, local variables, and function call return addresses.

EBP (Extended Base Pointer):

EBP is conventionally used as a base pointer for the stack frame of the current function.
It provides a stable reference point for accessing function parameters and local variables, as the ESP register may change during function execution due to stack operations.
In this role, EBP is often used with an offset to address local variables and function parameters.

Indicator

The indicator is a special-purpose register:

EFLAGS: This register contains flags that reflect the status of the processor and the results of the most recent operations, such as carry, overflow, and zero flags.

x86 Memory

Registers vs Memory

Registers and memory are essentially similar in that they are both containers used to store data with a fixed width. However, the capacity of memory is much larger than the number of registers available. Due to this, it is not practical to assign names to each memory unit, as is done with registers. Instead, we use address numbers to identify memory locations.

When discussing a computer CPU as 32-bit or 64-bit, we primarily refer to the width of memory addressing rather than the width of the registers themselves. In computer memory, each byte has an address, as illustrated in the following table:

Memory Address	Binary Value Stored
0x00000000	00000000
0x00000001	00000001
…	…
0xFFFFFFFF	01010010

For a 32-bit computer, the maximum memory address is a 32-bit value, which is represented in hexadecimal as 0xFFFFFFFF. This means that the memory addressing range of a 32-bit computer is [0, 0xFFFFFFFF]. Within this range, there are a total of 0x100000000 bytes, or 4 GB. Consequently, a 32-bit computer can typically only address 4 GB of memory, although some operating systems can address more memory through physical address extension techniques.

Memory addresses are used when reading data from memory or writing data to memory. To do so, we must first locate the position for reading or writing, much like writing an address on a letter.

Writing/Reading Data from Specified Memory

Here's an example of assembly code for writing and reading data from specified memory:

DWORD: Represents the width of data read or written from memory.

PTR: Short for "pointer," indicating that the number that follows is a pointer.

DS: Segment register.

[0x0012FF34]: Memory address.

In this specific code, DS is used as a prefix to the memory address [0x0012FF34]. This prefix indicates that the memory address is relative to the base address of the data segment. By using DS, the actual memory address being accessed would be the sum of the base address of the data segment and the specified offset (0x0012FF34).

Using segment registers like DS allows the x86 architecture to access a larger memory space than would be possible with just a single 32-bit or 64-bit address, by dividing memory into segments and providing base addresses for each segment. This segmentation of memory was more relevant in older x86 systems, which had smaller address spaces.

In modern x86 systems, especially those running in a flat memory model (like most 32-bit and 64-bit systems), the DS segment register is often not explicitly needed, as the base address of the data segment is typically set to zero. However, it may still be used for clarity or when working with segmented memory models.

It is important to note that memory addresses should not be assigned arbitrarily, as memory is protected, and not all memory can be read or written directly.

x86 Stack

The x86 stack is a data structure that meets specific requirements, mainly for temporarily storing data, keeping track of the amount of stored data, and quickly locating and accessing specific data items within the data structure. To design this data structure, we use a memory block with designated starting and ending positions represented by two 32-bit general-purpose registers called 'base' and 'top', which store memory addresses.

We can design the data structure as follows:

Top

…

base

The 'base' register stores the address representing the starting position of the memory block, while the 'top' register stores the address representing the ending position of the memory block. When pushing data onto the stack, the value of the 'top' register is decreased by the size of the data being added. Conversely, when popping data from the stack, the value of the 'top' register is increased by the size of the data being removed.

If you need to access an intermediate data item within the stack, you can do so by adding the appropriate offset to either the 'top' or 'base' register. This concept forms the foundation of a stack in the x86 architecture, providing a flexible and efficient way to manage temporary data storage and retrieval.