Book Notes - Programming from the Ground Up
Finally learning how a computer works.

Table of Contents

Introduction

These are my notes for the Programming from the Ground Up book. Each section is a different chapter, which is named accordingly.

With these notes I hope to capture my main learnings from the book in a concise document. I will also be adding more information and sources to the part's that were particularly confusing to me.

This page is still a work in progress, as I have not yet finished the book. Hope to do it soon!

Chapter 2 - Computer Architecture

Computer's are based on the Von Neumann architecture, which divides a computer into two main parts:

  1. Memory
  2. CPU

The memory of a computer is equivalent to a sequence of numbered, fixed sized boxes. Unlike a real box, each memory address can only store a 1 byte number.

The CPU has the task of taking information with it and doing computations. CPU programs get stored in memory just like regular data, but they interpreted differently by the CPU. The program counter is what tells the CPU where to get the next instructions to run, and it's usually incremented after the CPU fetches each an instruction.

Besides memory, the CPU also has registers which can be accessed much faster than regular memory, and are used for computation. Register's typically store a word, which is equivalent to 4 bytes. Memory addresses are also a word (on 32 bit computer's I think).

Addressing Modes

  • Immediate Mode: Data is embedded in the instruction;
  • Register Addressing Mode: Instructions contains a register and not a memory location;
  • Direct Addressing Mode: Instructions contains a memory address;
  • Indexed Addressing Mode: Memory address + offset;
  • Indirect Addressing Mode: A register contains a location in memory;
  • Base Pointer Addressing Mode: Similar to above, but also contains an offset.

Chapter 3 - You First Program

Writing a simple program

The code below is divided into different sections. the .data section is where we define our data, which is empty in the program because we have none. The .text section is where we define out program.

We use .globl _start to export the symbol _start to the object code, which will then be visible to the linker. More information here.

At the end, the code stores a value on %ebx, which will be the return code of the program. The 1 that is moved to %eax is the number of the exit system call. A system call is an operative system feature, that regular processes cannot do. For this reason, in order to do a system call, we need to interrupt our process. The int $0x80 line how we achieve this, meaning that we will now give control back to the OS. $0x80 is the number of the interrupt, which indicates who is handling the it. In this case, it's the Linux Kernel. This answer has more information.

.section .data 

.section .text

.globl _start 

_start:
    movl $1, %eax
    movl $0, %ebx
    int $0x80
0

Let's see what happens of we omit the last line:

.section .data 

.section .text

.globl _start 

_start:
    movl $1, %eax
    movl $0, %ebx
Segmentation fault

The reason we get a segmentation fault is because the CPU does not know when to stop the program, since we did not include any proper exit. More information here.

Getting the maximum value of a list

The following program calculates the maximum value of a list.

.section .data

data_items:
    .long 33,22,77,44,1,99,0 

.section .text

.globl _start

_start:
    movl $0, %edi 
    movl data_items(,%edi,4), %eax 
    movl %eax, %ebx

start_loop:
    cmpl $0, %eax
    je loop_exit
    incl %edi
    movl data_items(,%edi,4), %eax
    cmpl %ebx, %eax
    jle start_loop

movl %eax, %ebx
    jmp start_loop

loop_exit:
    movl $1, %eax
    int $0x80

99

data-items stores the address for the beginning of the data list. To access a value in the list, we use the following syntax: data-items(,INDEX,WORDSIZE).

The data is stored in a sequence of long values, which are 4 bytes each. This is convenient because it's also the size of the registers. movl data_items(,%edi,4), %eax moves the index stored at %edi to %eax.

To create a for loop, we first define a label start_loop. This is just the address of the start of the loop. The cmpl $0, %eax compares the value 0 to %eax, and store the result on the special %eiflags register. The instructions je, jle and jmp both react to the value stored on %eiflags.

Addressing memory

The general formula for addressing memory is ADDRESS_OR_OFFSET(%BASE_OR_OFFSET,%INDEX,MULTIPLIER). All of the fields are optional. All of the aforementioned addressing modes,except immediate mode, can be written this way.

Using parts of the register

The lsb of %eax can be accessed by %ax, which is itself divided into %ah and %al.

Chapter 4 - All About Functions

The stack

The stack is somewhat like a pile of papers on a desk. In practice, the "top" of the stack is the bottom of the memory addresses.

The %esp register always contains a pointer to the top of the stack. Everytime we push something onto the stack using pushl, the %esp register get's incremented by 4. We can pop out of the stack by using popl.

In the following there are some examples involving this register:

movl (%esp), %ebx # Moves the value at the top of the stack to %ebx
movl %esp, %ebx # Moves the pointer to the top of the stack to %ebx
movl 4(%esp), %ebx # Moves the value just below the top of the stack to %ebx

The stack is the key element of implementing functions in the C programming language calling convention, which we use in this book. Following this convention, the program must first push onto the stack the parameters of the function. Then, it uses the call instruction, indicating the function which functions it wishes to use.

The call instruction two things:

  1. Pushes the return address onto the stack
  2. Modifies %eip to point at the start of the function

The function must also do some work:

  1. pushl %epb. The %epb register is a special register used to access function parameters.
  2. movl %esp, %epb. This saves the location of the of the current stack top into %epb.
  3. Saves space for each of the local variables. If we need 2 local variables, we use subl $8, %esp.

When a function is done, it does the following:

  1. Stores the value to return in %eax.
  2. Reset's the stack to what it was before.
  3. uses ret to pop whatever value is at the top of the stack, which pop's the stack and set's that value to %eip.

Below are a couple of small exercises that I made in order to get used to using the stack.

Here I just push and pop a value on the stack

.section .data
.section .text

.globl _start

_start: 
  pushl $2 
  popl %ebx
  movl $1, %eax
  int $0x80
2

And here I use the %esp register to get the value at the top of the stack

.section .data
.section .text

.globl _start

_start: 
  pushl $4 
  movl (%esp), %ebx
  movl $1, %eax
  int $0x80
4

Finally, here I write the actuall power function exercise, as instructed in the book.

.section .data
.section .text
.globl _start

_start: 
  pushl $3 
  pushl $3
  call power
  movl %eax, %ebx
  movl $1, %eax
  int $0x80

.type power, @function # Tell's the linker this is a function
power:
  pushl %ebp # Save old base pointer
  movl %esp, %ebp
  subl $4, %esp # Make space for 1 variable
  movl 8(%ebp), %ebx # First argument into %ebx
  movl 12(%ebp), %ecx # Second argument into %ecx
  # -4(%ebp) is the local variable we made space for
  movl %ebx, -4(%ebp) # Store the current result. 
  
power_loop_start:
  cmpl $1, %ecx
  je end_power
  movl -4(%ebp), %eax
  imull %ebx, %eax # stores result on the second operand (%eax)
  movl %eax, -4(%ebp) 
  decl %ecx
  jmp power_loop_start

end_power:
  movl -4(%ebp), %eax # Save result on %eax
  movl %ebp, %esp
  popl %ebp
  ret
27

jmp vs call

While reading this chapter, one question I had was whhat was the diference between using the jmp instruction vs the call instructions. Both of this instructions will jump to another label on the code, however call will also store the return address on the stack and will return to it after the ret instruction. More information here.

Recursive functions

The following program will calculate the factorial of a number using recursion.

.section .data
.section .text

.globl _start

_start:
  pushl $4 # The function argument
  call factorial
  addl $4, %esp  # Moves the stack pointer back before we pushed $4
  movl %eax, %ebx
  movl $1, %eax
  int $0x80

.type factorial, @function 
factorial:
  pushl %ebp # Useful so we can restore %epb at the end
  movl %esp, %ebp # Now we an use %ebp to access the variables
  movl 8(%ebp), %eax # Move the parameter to %eax
  cmpl $1, %eax
  je end_factorial
  decl %eax
  pushl %eax  # Push argument
  call factorial
  movl 8(%ebp), %ebx
  imull %ebx, %eax


end_factorial:
  movl %ebp, %esp # Restore %ebp and %esp to where they were before the function call
  popl %ebp
  ret 

24

Additional Exercises

The previous function could be rewriten non recursively with a for loop.

.section .data
.section .text

.globl _start

_start:
  movl $4, %eax 
  movl $1, %ebx

factorial_loop:
  cmpl $1, %eax
  je loop_exit
  imull %eax, %ebx
  decl %eax
  jmp factorial_loop

loop_exit:
  movl $1, %eax
  int $0x80
24

The next function calculated the square of an argument. In this exercise I am deliberatedly not following the calling convention outlined in the book.

.section .data
.section .text

.globl _start

_start:
  pushl $4 
  call square
  movl $1, %eax
  int $0x80

.type square, @function 
square:
  pushl %ebp 
  movl %esp, %ebp 
  movl 8(%ebp), %ebx 
  movl %ebx, %eax 
  imull %eax, %ebx  # Store the result on %ebx

end_square:
  movl %ebp, %esp 
  popl %ebp
  ret 

16

Chapter 5 - Dealing with files