Previous post

Next post

Welcome to the first lesson in the ‘Reverse Engineering Basics’ series. You should be working on Ubuntu 16.04 or later, or any *NIX platform that you are confident with. Ensure that you have gcc, g++ and appropriate compilers for 64 and 32 bit programs. I will mention some more specific programs needed when they come up in the series.

Lets first go over some definitions, and an introduction to the CPU.

Machine Code:

Code consisting of a series of instructions that is directly processed by the CPU.

Instruction:

A basic command for the CPU. Some simple commands would include moving data between registers, working with memory, and performing basic arithmetic operations. As a general rule, each CPU has its own Instruction Set Architecture (ISA).

Assembly Language:

A low level symbolic programming language with strong correlations between the language and the Machine Code instructions for the given architecture. Code is converted between Machine Code and Assembly Code by an assembler to make the job of programming easier.

CPU register:

Each CPU has a set amount of general purpose registers. x86 typically has 8, x86_64 and ARM typically have 16. The simplest way to understand a register is to think of it as a temporary untyped variable.

Because higher level languages are easy for people to understand and low level native machine code is easy for CPUs to understand, most modern programming is done through a higher level programming language which is converted to machine code through a compiler.

Number Systems:

People are typically accustomed to a decimal number system in base 10, likely because humans have 10 fingers. However, “10” has no inherent significance in mathematics, and so it makes sense that computers deal with numbers in binary, with a 1 or 0 representing the flow of electricity in a wire. If a number system has 10 digits, it has a radix (or base) of 10, binary has a radix of 2.

There are two important notes to remember:

A number is a number, while a digit is a term from writing systems, which is typically one character.

The value of a number does not change at all when converted to a different radix, only the notation used is changed.

Positional notation is used for the vast majority of number systems, with digits having a weight associated with its position within a number.

For example, 1234 really stands for:

103x1 + 102x2 + 101x3 + 100x4 = 1234

Following this logic, it’s easy to show how Binary works in a similar fashion.

0b10110 really stands for:

24x1 + 23x0 + 22x1 + 21x1 + 20x0 = 22

With a similar system, base 15, or hexadecimal numbers can be used to express larger numbers with less digits.

Hex Decimal Binary
0 0 0000
1 1 0001
2 2 0010
3 3 0011
4 4 0100
5 5 0101
6 6 0110
7 7 0111
8 8 1000
9 9 1001
A 10 1010
B 11 1011
C 12 1100
D 13 1101
E 14 1110
F 15 1111

Much of the time, numbers with a different radix will look identical, and so conventions exist to differentiate different notations.

Decimal numbers are typically written without and extension or prefix. eg, 1234. Some assemblers allow an identifier on the decimal number, where the number would be represented as 1234d. Binary numbers are often appended with the 0b prefix. eg, 0b10110. Occasionally, binary numbers are also denoted with b as a postfix. Eg, 10110b

Hexadecimal numbers are typically prefixed with 0x. Eg, 0x42EF. Sometimes they are given the postfix h. Eg, 42EFh. In most conventions, h is given as a postfix if the number begins with a non-decimal character.

One other numerical system often used in computer programming is Octal Radix. In octal, there are 8 digits, 0 through 7. Each is mapped to 3 bits of data. One interesting modern use of octal is in the *NIX utility chmod, where the arguments of which can be represented by 3 digits, representing read, write and execute. Each digit making up chmod can be represented in binary form.

Decimal Binary Chmod
7 111 rwx
6 110 rw-
5 101 r-x
4 100 r–
3 011 -wx
2 010 -w-
1 001 –x
0 000

In a similar manner, floating point numbers can be distinguished from integers by appending .0 to the end. Eg, 123.0 rather than 123.

Now that we’ve gotten through some basic definitions and number theory, we can start getting into some more technical information in the next post.


x89k

Python, C, Reverse Engineering, Security