Book Notes - Programming from the Ground Up
Finally learning how a computer works.
Table of Contents
Introduction
These are my notes for the Programming from the Ground Up book. Each section is a different chapter, which is named accordingly.
With these notes I hope to capture my main learnings from the book in a concise document. I will also be adding more information and sources to the part's that were particularly confusing to me.
This page is still a work in progress, as I have not yet finished the book. Hope to do it soon!
Chapter 2 - Computer Architecture
Computer's are based on the Von Neumann architecture, which divides a computer into two main parts:
- Memory
- CPU
The memory of a computer is equivalent to a sequence of numbered, fixed sized boxes. Unlike a real box, each memory address can only store a 1 byte number.
The CPU has the task of taking information with it and doing computations. CPU programs get stored in memory just like regular data, but they interpreted differently by the CPU. The program counter is what tells the CPU where to get the next instructions to run, and it's usually incremented after the CPU fetches each an instruction.
Besides memory, the CPU also has registers which can be accessed much faster than regular memory, and are used for computation. Register's typically store a word, which is equivalent to 4 bytes. Memory addresses are also a word (on 32 bit computer's I think).
Addressing Modes
- Immediate Mode: Data is embedded in the instruction;
- Register Addressing Mode: Instructions contains a register and not a memory location;
- Direct Addressing Mode: Instructions contains a memory address;
- Indexed Addressing Mode: Memory address + offset;
- Indirect Addressing Mode: A register contains a location in memory;
- Base Pointer Addressing Mode: Similar to above, but also contains an offset.
Chapter 3 - You First Program
Writing a simple program
The code below is divided into different sections. the .data
section is where we define our data, which is empty in the program because we have none.
The .text
section is where we define out program.
We use .globl _start
to export the symbol _start
to the object code, which will then be visible to the linker. More information here.
At the end, the code stores a value on %ebx
, which will be the return code of the program. The 1 that is moved to %eax
is the number of the exit system call.
A system call is an operative system feature, that regular processes cannot do. For this reason, in order to do a system call, we need to interrupt our process.
The int $0x80
line how we achieve this, meaning that we will now give control back to the OS. $0x80
is the number of the interrupt, which indicates who is handling the it.
In this case, it's the Linux Kernel. This answer has more information.
.section .data .section .text .globl _start _start: movl $1, %eax movl $0, %ebx int $0x80
0
Let's see what happens of we omit the last line:
.section .data .section .text .globl _start _start: movl $1, %eax movl $0, %ebx
Segmentation fault
The reason we get a segmentation fault is because the CPU does not know when to stop the program, since we did not include any proper exit. More information here.
Getting the maximum value of a list
The following program calculates the maximum value of a list.
.section .data data_items: .long 33,22,77,44,1,99,0 .section .text .globl _start _start: movl $0, %edi movl data_items(,%edi,4), %eax movl %eax, %ebx start_loop: cmpl $0, %eax je loop_exit incl %edi movl data_items(,%edi,4), %eax cmpl %ebx, %eax jle start_loop movl %eax, %ebx jmp start_loop loop_exit: movl $1, %eax int $0x80
99
data-items
stores the address for the beginning of the data list. To access a value in the list, we use the following syntax: data-items(,INDEX,WORDSIZE).
The data is stored in a sequence of long values, which are 4 bytes each.
This is convenient because it's also the size of the registers.
movl data_items(,%edi,4), %eax
moves the index stored at %edi
to %eax
.
To create a for loop, we first define a label start_loop
. This is just the address of the start of the loop.
The cmpl $0, %eax
compares the value 0
to %eax
, and store the result on the special %eiflags
register.
The instructions je
, jle
and jmp
both react to the value stored on %eiflags
.
Addressing memory
The general formula for addressing memory is ADDRESS_OR_OFFSET(%BASE_OR_OFFSET,%INDEX,MULTIPLIER)
. All of the fields are optional.
All of the aforementioned addressing modes,except immediate mode, can be written this way.
Using parts of the register
The lsb of %eax
can be accessed by %ax
, which is itself divided into %ah
and %al
.
Chapter 4 - All About Functions
The stack
The stack is somewhat like a pile of papers on a desk. In practice, the "top" of the stack is the bottom of the memory addresses.
The %esp
register always contains a pointer to the top of the stack.
Everytime we push something onto the stack using pushl
, the %esp
register get's incremented by 4.
We can pop out of the stack by using popl
.
In the following there are some examples involving this register:
movl (%esp), %ebx # Moves the value at the top of the stack to %ebx movl %esp, %ebx # Moves the pointer to the top of the stack to %ebx movl 4(%esp), %ebx # Moves the value just below the top of the stack to %ebx
The stack is the key element of implementing functions in the C programming language calling convention, which we use in this book.
Following this convention, the program must first push onto the stack the parameters of the function.
Then, it uses the call
instruction, indicating the function which functions it wishes to use.
The call
instruction two things:
- Pushes the return address onto the stack
- Modifies
%eip
to point at the start of the function
The function must also do some work:
pushl %epb
. The%epb
register is a special register used to access function parameters.movl %esp, %epb
. This saves the location of the of the current stack top into%epb
.- Saves space for each of the local variables. If we need 2 local variables, we use
subl $8, %esp
.
When a function is done, it does the following:
- Stores the value to return in
%eax
. - Reset's the stack to what it was before.
- uses
ret
to pop whatever value is at the top of the stack, which pop's the stack and set's that value to%eip
.
Below are a couple of small exercises that I made in order to get used to using the stack.
Here I just push and pop a value on the stack
.section .data .section .text .globl _start _start: pushl $2 popl %ebx movl $1, %eax int $0x80
2
And here I use the %esp
register to get the value at the top of the stack
.section .data .section .text .globl _start _start: pushl $4 movl (%esp), %ebx movl $1, %eax int $0x80
4
Finally, here I write the actuall power function exercise, as instructed in the book.
.section .data .section .text .globl _start _start: pushl $3 pushl $3 call power movl %eax, %ebx movl $1, %eax int $0x80 .type power, @function # Tell's the linker this is a function power: pushl %ebp # Save old base pointer movl %esp, %ebp subl $4, %esp # Make space for 1 variable movl 8(%ebp), %ebx # First argument into %ebx movl 12(%ebp), %ecx # Second argument into %ecx # -4(%ebp) is the local variable we made space for movl %ebx, -4(%ebp) # Store the current result. power_loop_start: cmpl $1, %ecx je end_power movl -4(%ebp), %eax imull %ebx, %eax # stores result on the second operand (%eax) movl %eax, -4(%ebp) decl %ecx jmp power_loop_start end_power: movl -4(%ebp), %eax # Save result on %eax movl %ebp, %esp popl %ebp ret
27
jmp
vs call
While reading this chapter, one question I had was whhat was the diference between using the jmp
instruction vs the call
instructions.
Both of this instructions will jump to another label on the code, however call
will also store the return address on the stack and will return to it after the ret
instruction.
More information here.
Recursive functions
The following program will calculate the factorial of a number using recursion.
.section .data .section .text .globl _start _start: pushl $4 # The function argument call factorial addl $4, %esp # Moves the stack pointer back before we pushed $4 movl %eax, %ebx movl $1, %eax int $0x80 .type factorial, @function factorial: pushl %ebp # Useful so we can restore %epb at the end movl %esp, %ebp # Now we an use %ebp to access the variables movl 8(%ebp), %eax # Move the parameter to %eax cmpl $1, %eax je end_factorial decl %eax pushl %eax # Push argument call factorial movl 8(%ebp), %ebx imull %ebx, %eax end_factorial: movl %ebp, %esp # Restore %ebp and %esp to where they were before the function call popl %ebp ret
24
Additional Exercises
The previous function could be rewriten non recursively with a for loop.
.section .data .section .text .globl _start _start: movl $4, %eax movl $1, %ebx factorial_loop: cmpl $1, %eax je loop_exit imull %eax, %ebx decl %eax jmp factorial_loop loop_exit: movl $1, %eax int $0x80
24
The next function calculated the square of an argument. In this exercise I am deliberatedly not following the calling convention outlined in the book.
.section .data .section .text .globl _start _start: pushl $4 call square movl $1, %eax int $0x80 .type square, @function square: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ebx movl %ebx, %eax imull %eax, %ebx # Store the result on %ebx end_square: movl %ebp, %esp popl %ebp ret
16