Engineering notebook
ENGINEERING NOTEBOOK: LEARNING RUSTS WITH TOO MANY LINKED LISTS!
I've been reading about implementing advanced collections in Rust thanks to Alexis Beingessner's wonderful, if perhaps excessively opinionated, lecture on "exceptional" collections1. Rust's enforced memory correctness requires the programmer to make intelligent decisions about references and move semantics, stuff no one's ever really had to think about in C or C++. I'm having a hard time internalizing this stuff, so here's my stab at recording what I've learned. His first structure is the singly linked list.
In Lisp, a list is just a data structure, often called a CONS, with two slots, the CAR and the CDR. The CAR slot holds your data. The CDR slot holds a pointer to the next CONS, or NIL, which means "The end of the List." A CONS that is only NIL is the empty list.
In Rust, you have to be concerned with both the heap and the stack. So, in Rust, you need a data structure to point to the head of the list, a sum-type to differentiate between the NIL case and the CONS case, and a CONS structure to hold the CAR and the CDR in the non-NIL case. The head can go on the stack; the list can be on the heap after that.
In fact, the head has to go onto the heap, because otherwise your list has a recursive structure, and Rust doesn't know how to size it. By using a Box
, which is a handle to a heap object, you tell Rust the CDR is the size of a pointer, now Rust knows how to size and allocate it.
When doing a cons, there's a moment when you create the new node, attach existing list to it, then retuns the new node as the head. Because we have a separate object to manage pointing to the head, that's a two-step operation. Rust objects: at some point, the head object exists in two places, and Rust doesn't like that. To overcome that, Beingessner uses the mem::replace function to make sure that the head temporarily has an acceptable NIL value when building the new node, so that the head object can then be made to point to the new node later.
Popping them off the list (Beingessner is more building a stack here at the moment) has a similar problem, and a similar solution: so that we can work with the recently-popped node, we have set the head to NIL temporarily, then fix it once we've extracted the data. But because they're boxed, we need to move the node to the stack (and yes, this apparently results in a bitmap copy!) so that we have a local copy we can work with.
The last thing Beingessner shows us is the Drop
trait. Normally, this wouldn't be an issue; Rust is very good about cleaning up after itself, and this stack would clean up after itself when the head goes completely out of scope. But, Beingessner writes, that clean-up would be recursive. Since our list is on the heap2, it can grow to occupy all the memory available, which is fine. But the stack does have an upper limit, and a recursive function would blow the stack, especially since in this case there's a follow-on operation that can't be tail-call optimized: we need to drop the contents inside the Box first, then drop the Box. (Rust currently doesn't do tail-call optimization anyway.) Beingessner teaches how to write this function manually and iteratively so that the default recursive version doesn't blow you up.
So those are the things I've learned so far: how to read debugging output from the borrower, how to handle the stack and the heap, how to manage temporary objects to keep the borrower happy in multi-step memory management code, how to unbox items to get at their insides, and how to re-write recursive Drop
functions to make them iterative and efficient.
1 Beingessner apparently believes the only thing you need aside from the traditional scalars is arrays and dictionaries; trees, graphs, tries, lists, stacks, queues, priority heaps are all unusual and unneeded by most programmers. He's not wrong about that: 99% of my professional life I've never needed anything more advanced than a dictionary. But right now, I do need these weird things.
2 I avoided writing "... since our stack is on the heap," because that would just be confusing.