Scheme Compiler in Incremental Steps: Heap Allocation

Sasank Chilamkurthy | | 7 minutes to read.

A lisp is not a lisp if it doesn’t support lists! In this post, let’s implement complicated data structure like lists, strings, etc. These data structures don’t fit in single word like they did for integers, characters and others from our earlier tutorial. So we will preallocate yet another contiguous area of memory called heap. Our complex data structures will be stored in this heap.

We will use the same tagged pointer representation we used for integers and characters for our objects. We will use the following 3 bit tags:

This means only rest of the bits can be used for addressing/pointing to the data. We’ll ensure that the lower 3 bits of our pointers are always 000 - i.e. we will allocate objects on heap only at 8 byte boundaries. This way we can recover the actual address from tagged address by simply subtracting the tag value. Our word size is anyway 8 bytes so we don’t need to do any extra book keeping!

Heap

Infrastructure Upgrade: Heap

Just like how we did for stack, we will create a heap in run_time.c using allocate_protected_space. We will pass the pointer to the beginning of the heap to scheme_entry. In scheme_entry, we will dedicate a register %rsi to hold the allocation pointer. Every time an object is constructed, the value of %rsi is incremented according to the size of the object. Note that our heap grows from low to high addresses as opposed to stack.

// edit in runtime.c
int main(int argc, char** argv){
    int stack_size = (16 * 4096);   /* holds 16k cells*/
    char *stack_top = allocate_protected_space(stack_size);
    char *stack_base = stack_top + stack_size;

    int heap_size = (160 * 4096);  /* holds 160k cells */
    char *heap_top = allocate_protected_space(heap_size);
    
    int val = scheme_entry(stack_base, heap_top);
    print_val(val);

    deallocate_protected_space(stack_top, stack_size);
    deallocate_protected_space(heap_top, heap_size);
    return 0;
}

According to Linux x86-64 calling convention, %rsi holds the second argument passed to a function. In our case, second argument is address to heap_top which is being stored in %rsi. This is exactly as we desired. Therefore, we don’t have to modify any of the linker assembly code.

Add Pairs

In lisp, pairs form the basis for lists. Pair is composed of car and cdr. Primitive cons creates a pair like this: (cons car cdr). A list is formed by storing value in car and pointer to next pair in cdr until we hit empty list '() in cdr. Recall that empty list '() is an immediate constant we defined in previous step. In other words:

(cons 1 (cons 2 (cons 3 (cons 4 '()))))
=> (1 2 3 4)

List

We will also define primitives pair? as a predicate for pairs. Primitives car and cdr take a pair as argument and return parts of the pair. Let’s get into implementation:

; lists
(define-primitive (pair? si env expr)
    (emit-expr si env expr)
    ; apply heap mask (1 bits heap-shift times)
    ; to lower byte of %eax
    (emit "and $~s, %al" (- (ash 1 heap-shift) 1))
    (emit "cmp $~s, %al" pair-tag)
    (emit-zeroflag-to-bool))

(define-primitive (cons si env arg1 arg2)
    ; compute car
    (emit-expr si env arg1)
    ; store it in stack
    (emit-stack-save si)
    ; compute cdr
    ; increment stack index so that car is not overwritten
    (emit-expr (next-stack-index si) env arg2)

    ; save cdr in next heap word
    (emit "mov %rax, ~a(%rsi)" word-size)

    ; save car to start of heap by saving previous 
    ; result to scratch first because we can't move
    ; address to address direclty.
    (emit "mov ~a(%rsp), %rax" si)
    (emit "mov %rax, 0(%rsi)")

    ; move address to result
    (emit "mov %rsi, %rax")
    ; add pair tag
    (emit "or $~a, %rax" pair-tag)
    ; increment heap pointer
    (emit "add $~a, %rsi" (* 2 word-size)))

(define-primitive (car si env arg1)
    (emit-expr si env arg1)
    (emit "mov ~a(%rax), %rax" (- pair-tag)))

(define-primitive (cdr si env arg1)
    (emit-expr si env arg1)
    (emit "mov ~a(%rax), %rax" (- word-size pair-tag)))

Let’s add tests:

(run-test '(pair? (cons 1 2)) "#t\n")
(run-test '(car (cons 1 2)) "1\n")
(run-test '(cdr (cons 1 2)) "2\n")
(run-test '(car (cons 1 (cons 2 3))) "1\n")
(run-test '(car (cdr (cons 1 (cons 2 3)))) "2\n")
(run-test '(pair? (cons 1 2)) "#t\n")
(run-test '(pair? 12) "#f\n")
(run-test '(pair? #t) "#f\n")
(run-test '(pair? #f) "#f\n")
(run-test '(pair? ()) "#f\n")
(run-test '(integer? (cons 12 43)) "#f\n")
(run-test '(boolean? (cons 12 43)) "#f\n")
(run-test '(null? (cons 12 43)) "#f\n")
(run-test '(not (cons 12 43)) "#f\n")
(run-test '(if (cons 12 43) 32 43) "32\n")
(run-test '(car (cons 1 23)) "1\n")
(run-test '(cdr (cons 43 123)) "123\n")
(run-test '(car (car (cons (cons 12 3) (cons #t #f)))) "12\n")
(run-test '(cdr (car (cons (cons 12 3) (cons #t #f)))) "3\n")
(run-test '(car (cdr (cons (cons 12 3) (cons #t #f)))) "#t\n")
(run-test '(cdr (cdr (cons (cons 12 3) (cons #t #f)))) "#f\n")
(run-test '(let ([x (let ([y (+ 1 2)]) (* y y))])
            (cdr (cons x (+ x x)))) "18\n")

And verify everything works:

$ guile tests.scm
...
(null? (cons 12 43)): passed
(not (cons 12 43)): passed
(if (cons 12 43) 32 43): passed
(car (cons 1 23)): passed
(cdr (cons 43 123)): passed
(car (car (cons (cons 12 3) (cons #t #f)))): passed
(cdr (car (cons (cons 12 3) (cons #t #f)))): passed
(car (cdr (cons (cons 12 3) (cons #t #f)))): passed
(cdr (cdr (cons (cons 12 3) (cons #t #f)))): passed
(let ((x (let ((y (+ 1 2))) (* y y)))) (cdr (cons x (+ x x)))): passed

That’s all for this post folks. We added heap to our compiler and allocated pairs in it. We saw how these pairs can be used to construct lists. However, we didn’t add pretty printing lists because that’s complicated. Nor did we add to our simple declaration of lists like (x y z) to our frontend. A more fully featured compiler would implement those.

Working code at the end of this step can be found at my Github repo with tag step_7_heap_allocation.