Hello! This is the start of my reverse engineering journey. I’m starting from knowing almost nothing and hopefully will learn a lot along the way. Maybe this journey will help people learn things following my steps. I will try to write everything here. All of my dumb failures and epic wins. So here we go!

Helloworld Assembly!

First things first I must say I have bare minimum understanding of assembly and it’s heavily needed in reverse engineering. So for my first step im going to copy paste a hello world code and rewrite it and try to learn everything on it. But before that we need to install ‘nasm’. nasm seems to be the compiler for assembly for lack of better word. I might be wrong but that’s the whole point of this log, to be wrong and learn from it.
I’m using arch linux so : yay -S nasm
Now to copy paste the code in HelloWorld.asm:

section	.text
   global_start   ;must be declared for linker (ld)
	
_start:	          ;tells linker entry point
   mov	edx,len   ;message length
   mov	ecx,msg   ;message to write
   mov	ebx,1     ;file descriptor (stdout)
   mov	eax,4     ;system call number (sys_write)
   int	0x80      ;call kernel
	
   mov	eax,1     ;system call number (sys_exit)
   int	0x80      ;call kernel

section	.data
msg db 'Hello, world!', 0xa  ;string to be printed
len equ $ - msg     ;length of the string
 

I can already understand a few things from the comments but it’s not nearly enough for me. If this is the basic assembly I’m going to do my research about every single command there. but let’s not worry about that now and just compile this piece of code.

Compiling

To start compiling we have to start with:
nasm -f elf HelloWorld.asm

-f specifies format
elf is a file format

The command made another file called Helloworld.o. ‘o’ extension means object file and we still dont have an executable. We have simply “assembled” our source file.
Now i tried this command to make my .o file an executable:
ld -o HelloWorld Helloworld.o
It returned an error:

>ld: i386 architecture of input file `Helloworld.o' is incompatible with i386:x86-64 output
>ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000"
 

So we have an error on our hand and a warning. error is saying we are inputing i386 architecture but output is i386:x96-64.
I looked it up and realized what’s the problem for our first error. Apparently when we ran the command nasm -f elf Helloworld.asm we made a 32 bit object file and if we wanted to make 64bit we should have entered nasm -f elf64 Helloworld.asm. This is a problem because apparently by default our ld command wants to make 64bit executable but we gave it 32 bit. so let’s rm our object file and try again with 64bit shall we.

nasm -f elf64 Helloworld.asm

and now:

ld -o HelloWorld Helloworld.o

and voila we were correct error is gone! I’m going to ignore the warning for now because it seems to correct itself. Bad practice yes but im already tired from debugging. XD

btw ld is the linker. linker I assume needs to link different parts of our code. so it needs to find entry point and needs to link other files and functions to it.

Now lets keep going. if HelloWorld is an executable we should be able to run it.
./HelloWorld

“Hello, World!”
Finally.

Learning what our assembly program does and rewriting it

section .text:

Let’s divide it into two parts: section and text.
“A section is the smallest unit of an object file that can be relocated. “

Object file seems to repeat itself a lot. I assume it’s something important. Object file means:
“An object file contains the same instructions in machine-readable, binary form”

Great so now that is out of the way. there seems to be many type of sections:

Sections containing the following material usually appear in relocatable ELF files:
The data section,

The bss section, and

The text section.

The text section is used for keeping the actual code. This section must begin with the declaration global _start, which tells the kernel where the program execution begins.

Ok, now we know what section .text is, what does global start do. we have another start in the code. So what is their difference?

start is supposed to be entry point and global is a way for linker to know where our entry point is. We are using default entry point. there is a way to make custom entry point but i dont care tbh.

eax,ebx,ecx,edx

Ok, this is the tricky part. These are the names of registers.

There are ten 32-bit and six 16-bit processor registers in IA-32 architecture. The registers are grouped into three categories −

General registers,
Control registers, and
Segment registers.

The general registers are further divided into the following groups −

Data registers,
Pointer registers, and
Index registers.

And the registers in code above are all part of data registers. They are used for arithmetic, logical and other operations (such as printing things to screen). How we use registers depends on what action we are doing. In this case we are printing a string. Printing a string requires some kernel operation. this operation is called sys_call. All of these operations are represented with a number. forexample sys_write is 4, sys_read is 3, sys_exit is 1. We can find all about the sys_calls in a c header file called unistd_32.h.
In order to access them we can use these comands:

   locate unistd_32.h
   # > /usr/include/asm/unistd_32.h
   cat /usr/include/asm/unistd_32.h | grep _write
   # > #define __NR_write 4
 

We use eax to supply our sys_call, ebx for our file descriptor “stdout” (we get back to that later), ecx for our string, edx for the length of our string. There is a reason that we use these in this specific way but for now I try not to think about it and just memorize things. If I forget i just look them up. For example we can use man to see argument structure of write.

   man 2 write
   # > ssize_t write(int fd, const void *buf, size_t count);
 

and you go by the order of they are written. fd refers to ebx, buf refers to ecx, and count refers to edx. I think this is how it goes but I might be wrong. Future me might be judging me heavily but for now that’s how I use the manual.

file descriptor

A file descriptor is a number that uniquely identifies an open file in a computer’s operating system. It describes a data resource, and how that resource may be accessed.

Kernel needs to know this number to perform operation for write. Forexample to know information about the type of file.

int 80

int doesn’t mean integer surprise surprise. It actually means interrupt.

An interrupt is a condition that halts the microprocessor temporarily to work on a different task and then return to its previous task. Whenever an interrupt occurs the processor completes the execution of the current instruction and starts the execution of an Interrupt Service Routine (ISR) or Interrupt Handler.

So we tell cpu to start doing our thing which is writing our string using sys_call.

Now Time to Rewrite

I’m not going to just rewrite this. I’m going to improve on it based on what we learned. I’m going to print “enter your name” then read and input and then print it then gracefully exit the program.

so first I added the necessary variables.

  section .data
  msg1 db "Please enter your name!", 0xa
  len1 equ $ - msg1    
  msg2 db "Your name is:", 0xa
  len2 equ $ - msg2

and then printed them on screen like how we learned.

now time to read from user input with sys_read.
First we need to add .bss section. what it does is that it reserves bytes for our input. resb means reserve byte.

  section .bss
  name resb 5

Then we just finish the code using man page.

 section .text
    global_start   ;must be declared for linker (ld)
         
 _start:           
    mov  edx,len1  
    mov  ecx,msg1   
    mov  ebx,1     
    mov  eax,4     
    int  0x80      
 
    mov eax, 3 ;sys_read
    mov ebx, 2
    mov ecx, name
    mov edx, 5
    int 0x80
 
    mov edx, len2
    mov ecx, msg2
    mov eax, 4
    mov ebx, 1
    int 0x80
         
    mov edx, 5
    mov ecx, name
    mov eax, 4
    mov ebx, 2
    int 0x80
         
    mov  eax,1     ;system call number (sys_exit)
    int  0x80      ;call kernel
 
 section .data
 msg1 db "Please enter your name!", 0xa
 len1 equ $ - msg1     
 msg2 db "Your name is:", 0x20, 0x0
 len2 equ $ - msg2
 
 section .bss
 name resb 5

and we got it!

Please enter your name!
pladimir
Your name is: pladimir