I have a job interview this week and one thing they wanted me to know was that they were into Uncle Bob, “Clean code,” and test driven development. They were really, really emphatic about Uncle Bob’s “Transformation Priority Premise,” his notion that you start with failing (broken) code that does nothing, then does one thing, then does a couple of things, each time writing a test that’s more and more specific, closer to the full specification of the desired outcome, and in each step making exactly one change to the code, in a prioritized list, to make the test pass.

Bob has a comic in which he illustrates the process. He claims that if you follow the TPP transformations as written you can automatically derive quicksort from a specification about sorting, but if you deviate and choose a later but “easier” transformation you’ll end up with bubble sort, the slowest of all the classic sorting algorithms.

Since I’ve been given a week to cram for this exam, I’ve been doing exactly that and I have had a number of reactions.

The first is that object-oriented programming as it is traditionally pursued in enterprise deployments is one of the greatest embarassements in human history. Looking at the examples, even the idealized ones, there’s so much clutter on the page for error handling that, even with Martin’s dictat that code should follow the “functional cores and imperative wrappers” pattern, and all objects should be as SOLID as possible, object oriened programming is noisy, cluttered, and cognitively overloading.

The second is that so many of the TPP “transformation” just don’t make sense in a functional programming context. For example, Bob says that you should go:

7. variable → array
8. array → collection

I’m just gonna stare at you in Functor. In functional programming, we learn about scalars and arrays, but very soon we also learn that these are specific instances of a simple idea, the value, and to say that one is more complex than another is a category error (in the ontological sense). It might make sense in practice to treat these as cognitively more complex objects, but in the end the object of functional programming is to abstract that all away and concentrate on the transformations of data qua data.

Transformation number 9 is just as curious:

9. statement → recursion 

The higher the number, the less you’re supposed to indulge in it, so let’s talk about this: why “recursion?” I’m a huge fan of recursion and prefer it to for loops and other imperative constructs, but I’m an even bigger fan of map / reduce / filter and the general power of folds to replace primitive recursion in almost every case. There’s an entire programming language built around this idea (Micrsoft’s Bosque).

I’ve spent this weekend in a long plunge back into object-oriented programming The Enterprise Way™, and I’ve came away feeling like not much has changed since the 1990s. The code is still too verbose, with too many error conditions being handled independently of each other instead of categorically, and with too much code being written about different code paths rather than different expressions. Too many developers of the sort Uncle Bob is trying to reach still think of their programs in Tinkertoy terms: fit the pieces together like a machine, stand the thing up, and hope it works.

Boggling 2: Parallelized Boogaloo!

When last we left our heroes, they’d solved Boggle! and done so with amazing speed. The only thing left to do was go faster.

There are two strategies, actually, for going faster. The first, mind-bogglingly enough, is to go the other way with your search space: building a graph of the board, for each word in the dictionary scan to see if that word could be made from the board.

There is a size of board for which this strategy is actually faster than the “search the board and check the word” strategy we’re using, but I’ll tell you this: that board is bigger than 12×12. Even with the two-phase bitmap used for boards bigger than 8×8, a 12×12 board solves in only a second or two.

I’m going to give much of the credit for this astounding speed to The Rust Programming Language itself. The allocator for Rust is wicked clever about allocating memory resources while at the same time preventing a whole host of errors common to C++. Having programmed mostly in Rust for the past nine months (with some Haskell and Racket, and my day-to-day tool of Hy), I think I’d rather be work at a coffee shop than go back to hacking C++ or Java.

Let’s stick with our solution, but let’s multithread it. This article will show you how I multithreaded the Boggle searcher using the low-level Crossbeam toolkit and it’s work queue extension, Deque. Cossbeam is an amazing library that takes Rust’s cleverness and applies it to allocating CPU resources.

First thing: except for the solve() function, nothing else in our program has to change. Everything else is already thread-safe. The board and the dictionary are both read-only: every call to solve launches the recursive solver solution for a specific starting point on the board, with a new, unique Scanner object to store both the successful matches (and prefix matches) and the bitmap to track the path taken.

So the trick is to call solveforpos() in a thread. However, the average consumer laptop has four or eight cores, but the average Boggle™ board has more positions than four or eight. We can’t just launch as many threads as we have positions to check.

It’s also not efficient to say, “If we have four cores, let’s just divvy up the 4×4 board into 4 work groups and give each group to a different thread.” This strategy is known as job batching, but it can lead to inefficiencies. One batch may be all small jobs, whereas another may be all big jobs. The small batches will be done and the cores idling while waiting for the big jobs to finish. We’ll address that later.

Crossbeam provides a mechanism for dealing with this: stealing. A collection of queues is shared among the threads, and when one thread finishes its local queue it can steal a job from another thread’s queue. If that doesn’t work, it can steal from a global queue.

For now, let’s not worry about stealing from peers. Let’s just have a global queue of Jobs to tackle. A Job encodes the starting position on the Board for a given scan and the Scanned object from the previous post that will collect the data. The system is done when the queue of jobs is empty.

How to use Crossbeam-Deque

We’re going to be using crossbeam and crossbeam-deque, Injector is the first object we’re going to be using from crossbeam. Injector is our deque (a first-in, first-out queue), and we’re loading our job descriptions into it. It’s a funny name: it implies that it pushes jobs, but most of the time our worker threads will actually be pulling jobs out of the Injector.

pub fn solve_mt(board: &Board, threads: usize) -> Vec<String> {
    struct Job(isize, isize, Scanned);

    let work = &{
        let mut work: Injector<Job> = Injector::new();
        for x in 0..board.mx {
            for y in 0..board.my {
                work.push(Job(x, y,
                    Scanned::new("".to_string(), Ledger::new(board.mx, board.my)),

The outer work variable is a reference to the work queue, which itself is now anonymous. This is important because it lets Rust know that work is going to be reaped at the end of this function while still allowing multiple immutable references to work to be used.

Crossbeam has a nifty thread launching system with one powerful abstraction I mentioned earlier: temporal scope. The scope() function allows for structured concurrency, guaranteeing that every thread launched within a scope has terminated before the parent thread is allowed to proceed. Inside a scope we’re free to spawn() as many threads as we like, but they will all end before we leave the scope.

So: we need a place to record the results of any one search, and then we need to launch the search, and at the end we need to collect all the results and return them. We also need to find the job we’re going to be processing, but I’m going to cheat and do that later. Let’s launch:

    let mut solutions: Vec<String> = vec![];
    crossbeam::scope(|spawner| {
        let handles: Vec<ScopedJoinHandle<Vec<String>>> = (0..threads)
            .map(|_| {
                spawner.spawn(move |_| {
                    let mut solutions: Vec<String> = vec![];
                    let mut queue: Worker<Job> = Worker::new_fifo();
                    loop {
                        match find_task(&mut queue, &work) {
                            Some(mut job) => {
                                solveforpos(&board, (job.0, job.1), &mut job.2, &mut solutions);
                            None => break,

A couple of things stand out here.

First, the topmost solutions collection is outside the scope. We don’t want to lose the result of our work when the scope ends and Rust reclaims the memory needed to do our processing. The temporary data generated during a search is ephemeral, but our final answers shouldn’t be.

Second, the handles collection is where our data actually ends up, and we need to extract the results of a thread from it. It’s a fairly complex type signature, but straightforward once you figure it out.

Third, we’ve got that Worker thing, which is also from crossbeam. Crossbeam-deque has its ‘job stealing’ mechanism, in which all threads in a scope can have multiple job queues and ‘steal’ work from each other (usually half the items in a queue) as each thread finishes its own queue.

We’re not going to use this feature. We’re just going to have a single scoped queue in Injector and pop jobs off. But crossbeam wants each thread to have a queue, so we’re going to have it, but at run-time that queue will have only one job in it, the current job. When a thread has completed one full search it will loop around and ask the Injector, for another job.

Fourth, we have another solutions vector of strings; this is the one that goes into the vector of ScopedJoinHandles, and in the end we have to merge and flatten them all.

And finally, a pro-tip: see that .collect() at the bottom? I used Range.map() to launch my threads, rather than a for loop. And .map() is lazy: it produces exactly one iteration, waits for it to complete, and then goes on to produce the next. Without .collect(), the spawner produces one child thread which does all the work… and then we’re actually a little slower than the the single-threaded version due to the overhead. Using .collect() there forces .map() to be eager, to try and collect all the threads at once and put their results into handles immediately.

All right, we’ve run the threads and collected the results. What’s next? Extraction:

        solutions = handles
            .map(|handle| handle.join().unwrap())


In the first line, when the parent scope hits the first handle.join(), it waits until the first thread launched terminates, and then collects its data. If the second and third thread are also done, it immediately collects those, otherwise it goes back into the waiting state until it’s merged the entire scope down. The .unwrap() is there to handle error conditions of the total concurrency handling process, but I’m going to ignore it for now.

We then sort and de-duplicate our collection and return it.

Much of the point of this exercise is to simplify the queue example from crossbeam. The version of find_task on that page is esoteric and difficult to read. I wanted a simpler version that I could understand. This is what mine looks like:

fn find_task<T>(local: &mut Worker<T>, global: &Injector<T>) -> Option<T> {
    match local.pop() {
        Some(job) => Some(job),
        None => loop {
            match global.steal() {
                Steal::Success(job) => break Some(job),
                Steal::Empty => break None,
                Steal::Retry => {}

A thread looking for work passes in both its local and the global queues. We attempt to pop the local job off, which should never exist; it’s just an empty container needed to pass this assertion. We then attempt to steal another job from the global queue. If we’re out of jobs, we return None. If we find a job, we return Some(job). And finally, if the global queue was locked (due to another thread being in the middle of getting work), we tell find_task to busy-wait and try again.

That’s it. That’s the whole of what that big mess on the crossbeam-deque page is trying to tell you, only using iter::repeat instead of loop and throwing in a whole bunch of stuff about stealing jobs from other threads. Which you will need someday if you write lots of concurrent code. For learning, however, you will not want it; you want the simplest level of code you can hold in your head. A loop does that for me.

And that’s it. Your Boggle solver is now multi-threaded. And the results are impressive: With four threads, it’s now about three times faster, solving the board in about 350ns. It’s not four times faster; there’s allocation and processing overhead involved in separating out the jobs. But it’s three times faster, which is a great improvement.

As always, the entire codebase is avaiable from the the boggle-solver project on my Github.

All right, so now I’ve got a bitmap for tracking the path I’ve taken on the board, and I’ve got a trie that can determine whether or not a sequence of letters is a word (or the prefix of a word) from a specified dictionary.

Let’s solve Boggle™.

A Boggle board is a 4×4 grid where each cell is filled with one of sixteen dice. Each die has a letter.  Our objective is to scan the board finding words without ever using the same die twice for a singe word.  The typical strategy is to start at a letter and scan its neighbors for patterns that might register in our brains as “a word.” We’ll use the same technique in our algorithm:

  1. Start with a letter
  2. Scan one neighbor, creating a two-letter word
  3. If that word is a whole word, add it to the solutions list
  4. If that word is a prefix of a word in the dictionary
    1. Pick a neighbor that we have not visited this scan
    2. If there is one, make a three-letter word, and recurse to 3, and so on.
  5. Terminate when the “word” is not a prefix of any word or there are no valid neighbors to visit

So we need a structure to contain the word we’ve scanned, and the Ledger. Let’s call it Scanned. The only special thing we need for Scanned is that for every step, it needs to clone the Ledger; that is, we get a new copy of Scanned for each neighboring letter, and recurse down that path with that assembled word. When the search terminates (due to exhaustion of the search space or the failure of the prefix), we will then use the copy of Scanned at this point in the stack to create a new search with the next neighbor, and so forth. This is a standard recursive strategy.

The result looks like this:

pub fn solve(&mut board) -> Vec<String> {
  let solutions = Vec<string>
  for x in 0..mx {
    for y in 0..my {
      let mut possibles = Scanned::new("".to_string(), Vec::new());
      solveforpos(board, x, y, &mut possibles, &mut solutions);

Step 2. is solveforpos(), where we implement the recursive strategy. But for one technicality, we could have implemented this as a single function, but for one little detail: The letter ‘Q’. There are special rules about Q, and we have to support them, as the kids say, “because English.”

pub(in crate) fn solveforpos(
  board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>)
) {
  let c = board.board[x as usize][y as usize];
  innersolveforpos(c, board, (x, y), curr, solutions, false);
  if c == 'q' {
    innersolveforpos('u', board, (x, y), curr, solutions, true);

The innersolveforpos() function checks for word validity, prefix validity, and position validity. Not shown here (but you can find it in the source), the Scanned object actually has the responsibility for adding the letter if the position is valid, and returning a new, longer “maybe-word” (remember, we have to support backtracking in our recursion) and a new, updated path Ledger. So that’s where we pass the “skip position check” flag, which in turn lets us put both “Q” (so we catch words like ‘sheqel’ and ‘burqa’) and “Qu” letters into our candidate string.

Look above, and you’ll see that we add ‘qu’ blindly whenever we encounter ‘q’. This is important. We have to let that happen because we need to continue even if “qe” and “qa” aren’t in the candidate list. “Quota” is a real word.

Once we’ve added the letter(s) and determined that the string we have is the prefix of a word found in the dictionary, we then scan the neighbors and recurse, skipping the current cube. The Ledger makes sure that we don’t re-check a letter for a given single search, but by cloning the letter and the candidate we also ensure that the backtracking is done correctly.

fn innersolveforpos(c: char, board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>, skip_pos_check: bool
) {
  match curr.add(c, (x, y), skip_pos_check) {
    None => return,
    Some(mut newcurr) => {
      if newcurr.0.len() > 2 && board.words.find(&mut newcurr.0.chars()) {
      if !board.words.pref(&mut newcurr.0.chars()) {

      for i in -1..=1 {
        for j in -1..=1 {
          if !(i == 0 && j == 0) {
            // Skip the current block!
            let (nx, ny): (isize, isize) = (x as isize + i, y as isize + j);
            if nx >= 0 && nx < board.mx && ny >= 0 && ny < board.my {
              solveforpos(board, (nx, ny), &mut newcurr, solutions)

And that’s pretty much how you solve Boggle. According to one source, the total number of Boggle boards out there in (nxm)! (that’s the factorial symbol there), or for 4×4 board, 16!, or it would take 20,922,789,888,000 visits to do absolutely every search of the board. Except for one thing: the English language is not random! It’s messy, but not random. The fact that many letter combinations cannot actually lead to a real word found in the candidate dictionary means that the vast majority of searches terminate early.

On my laptop, a 4×4 board with all ‘e’s and a dictionary of ‘eee’ through ‘eeeeeeeeeeeeeeee’ takes 5 minutes and 45 seconds to complete. But in practice, the average runtime of a boggle board with this algorithm is barely 1.5 milliseconds.

Which is not too damn bad at all.

Can we go faster? Yes we can.

One of the things I needed in my Boggle solver was a bitmap. The average Boggle! board is 4×4, or 16 squares, and one of the rules is that you may not use the same letter twice. I needed a data structure that would handle the "Have I seen this before?"

Now, for a 4×4 board, a 16-bit integer will work just fine. And, in fact, the Ledger structure in the solver is, by default, a 64-bit integer, capable of handling boards as big as 8×8. In those case, the "mark" and "check" operations are just indices. Since this is a two-dimensional structure, we need a way to convert two-dimensional constructs into a one-dimensional number.

We want to move forward through the bitmap, using the width of the bitmap to find which row we’re in, and then the position on that row as the final offset.

These functions use the Rust bitmap operators. << means "shift left", and for a lot of programming operations these are used to mean "multiply by two." An accompanying function >> means "shift right", and can mean "divide by two"; in assembly language, both of these functions have an associated register that will tell you if a zero or one "fell off", and there are are similar instructions that will actually "shift around"; if a ‘1’ falls off at the 0th position of a 64-bit register, a ‘1’ will be inserted into the 63rd position, and vice-versa.

For our purposes, though, we just care about "the bit at a given position," and we use the shift operator to shift that bit into that position. In this case, we just create a new u64 bitmap with that one bit set:

pub struct Ledger(isize, isize, u64);

impl Ledger {
    fn point(&self, x: isize, y: isize) -> u64 {
        1 << (self.1 * x + y)

And then marking the bitmap consists of ‘or’ing it against the representative bitmap.

   fn mark(&self, x: isize, y: isize) {
       self.2 |= self.point(x, y);

And checking the bitmap is to ‘and’ the pointed bitmap against the internal bitmap and checking that the result is not zero (a very fast operation in assembly language).

   fn check(&self, x: isize, y: isize) -> bool {
       self.2 &= self.point(x, y) != 0

As trivial as that is, what if the board is bigger than 8×8? At that point, we’re beyond the largest integer provided by mainstream CPUs, so we have to move to something simple: a map of maps. We’ll use a Vec<u8>, and kinda do what point() does up above, only in reverse: turn a single coordinate into a coordinate pair indicating which index in the vector to target, and then which bit in that u8 we want to set or change.

The only challenge here is that we need to know how big the vector will be ahead of time, and we need to ensure that the vector is pre-populated and that the entire bitmap starts out set to zero. In the event that our size isn’t evenly divisible by eight, we need one more block of bits for the overflow:

pub struct Bitmap(Vec<u8>, usize);

impl Bitmap{
  pub fn new(b: usize) -> FSBitmap {
        let s = b / 8 + { if (b % 8) == 0 { 0 } else { 1 } };
        FSBitmap((0..s).map(|_| 0 as u8).collect::<Vec<u8>>(), b)

The index of a point is then two numbers: which byte we want to examine, and then an offset into that byte. In many ways, it’s exactly what the Ledger does, only backward. The Ledger doesn’t care about machine sizes, just the board. The bitmap, though, cares about machine sizes.

  fn index(&self, p: usize) -> (usize, usize) {
    (p / 8, p % 8)

This format isn’t usable as a number, but as a way of tracking "in a modestly large atomic space, which units are in a which binary state?" it’s perfect. Marking a bit uses this pair:

  pub fn mark(&mut self, p: usize) {
    let (cell, byte) = self.index(p);
    self.0[cell] |= 1 << byte;

As does checking it. In this particular case, any bit referenced outside the size of the requested original size is assumed to be zero:

  pub fn check(&self, p: usize) -> bool {
    if p >= self.1 { return false; }
    let (cell, byte) = self.index(p);
    self.0[cell] & (1 << byte) != 0

Other operations such as flip and unmark can be written with just these operators.

Bitmaps aren’t the most thrilling data structures in the world, although there are some significant uses for very large, sparse bitmaps that have to be stored in something more esoteric than just a Vec<u8>. But for the purposes of Boggle! this was straightforward and not too difficult.

There was a research project named Wyvern (not related to the current programming language Wyvern) that was intended to invent what would probably be called a "lexically scoped paradigmatic programming language."

Programming languages are built on paradigms. In programming language theory, a "paradigm" is a set of ideas that form a mutually exclusive way of expressing the meaning of a program. The imperative paradigm treats programming mechanistically: "do this, then do this, then do this"; the functional paradigm treats programming mathematically: "if you have these different transformations of the data, eventually you get the answer you want."

The notion of lexically scoped paradigmatic programming is back, and I still believe it’s a bad idea. Jinnzest writes in favor of the idea: "[I]f a language, on the contrary, provides means to split code into separate paradigm we will have to examine only one of them at once abstracting away all other."

The idea of lexically scoped paradigmatic programming is simple: using some kind of syntax, we tell the compiler "Okay, now we’re gonna go all-functional, but we’re gonna glue it together with imperative," or "Everything in here should be assumed to be async and event-driven," and so on.

… and we’re back to Lisp. Lisp in Small Pieces, specifically. (Seriously, anyone who is working with a programming language who has not worked their way through LiSP is missing out.) We’re back to a world where reading someone else code is basically a "guess what’s going on here" and "your idea of the environment, stack, and so on are laughable." Lisp is a great language, but only if your people are super-disciplined about using it one way and one way only; otherwise you end up with all the paradigms all mixed up in a bucket.

This is one of the reasons I recommend Go for a commercial venture that’s trying to move away from Python and Node. Go has only one paradigm: imperative. It’s not even a very good event-driven language; goroutines and message passing devolve into imperative code very easily.

On the other hand, it’s a vile, ugly language, and that’s why I like to write in Haskell for learning, Rust for performance and Lisp for elegance. So, you know, not in production.


Data Structures: Tries in Rust

Posted by Elf Sternberg as Uncategorized

I recently went to a job interview (I’m looking for work, by the way; hire me!) and one of the questions put to me was “solve the Boggle™ board.” I don’t think I did very well on the question and it wasn’t until I got home that I remembered “Dammit, the fastest structure for checking if a word is in a dictionary is a trie!”

Whiteboard exercises are terrible. I don’t think I’ve ever actually programmed a trie, not even in Data Structures class I took way back in college in 19mumblemumble. I’m sure I’ve used them, but I’ve never had to write one.

So what is a trie?

A trie is a data structure that encodes every word in a dictionary as a string of nodes, one per letter. This trie encodes the dictionary containing the words “a”, “an,” “ant”, “art”, and “aunt”. If we’re looking up “ant,” we start at the root node and traverse to “a”, then “n”, and then “t”. The “t” is double-circled, that is, it’s marked as a terminal node, and thus “ant” is in this dictionary. If we were looking up the word “any”, we would get to “a,n,” and then fail, because “any” is not in this dictionary. Likewise, we might ask if “aun” is in the dictionary, and we would get to “a,u,n,” but that “n” is not double-circled, and thus “aun” is also not a complete word in this dictionary.

Like a regular expression, a trie search is successful when the string is exhausted, the trie is still active, and we find ourselves on a node marked as a word terminator. If the trie exhausts before the string, or if the string exhausts on a non-terminating node, the search fails to find a complete word.

The rules of Boggle™ are very simple: on a 4×4 grid, sixteen dice are shaken and dropped into place. Each die has six letters on it, probabilistically arranged. Players have three minutes to find every valid word on the board, where “valid” means “found in the dictionary the players have agreed to use,” modulo some rules like “no contractions or abbreviations.” The letters of the words must be adjacent in any of the eight cardinal directions (modulo the borders of the grid), and no letter may be used twice for the same word.

So, to solve for Boggle, we need a fast lookup dictionary, and thus we need a trie. Tries are extremely fast, Ο(n) where n is the length of the word. They’re memory-intensive, but these days memory is cheaper than time. Each node has two conditions: the list of letters that it leads to, and whether or not it is a terminal node. For the “list of letters,” we’re going to use a HashMap. The child nodes must be Boxed as we’re building this on the heap (it will be huge for a full-sized dictionary) and it must be RefCell‘d because we’ll be mutating child nodes repeatedly as we build the trie.

For a canonical dictionary lookup, this implementation is different the Wikipedia page’s description.  There’s no utility to storing the node’s value in itself; we know what the value is by the key we used to reach it!  We do need to know if the node is a terminator node.

There are strong feelings about tuples vs structs for values with multiple fields, with most people preferring structs.  For a value with two fields, I feel a tuple ought to be adequate.  A record with so many fields it has to name them, and therefore a struct, has a code smell and is worth examining closely for cognitive complexity.  I’ll address that in the next post.

pub struct Node(HashMap<char, Box<RefCell<Node>>, bool);

For this structure, we’re going to build it imperatively; that is, each word will be presented to the structure one at a time, and each word will be an iterator over its characters. The first thing we need to do is say, for a given node, if the word is exhausted, then we are a terminator node:

impl Node {
    pub fn insert(&mut self, word: &mut Iterator<Item = char>) {
        let c = match word.next() {
            None => {
                self.1 = true;
            Some(c) => c,

Note that we’re returning from this function if we’re a terminator. There is no character to insert into the next node.

If we’re not a terminator, we must then either access or create a child node, and then keep traversing the word until its exhausted. In the “create a new child” scenario, we create the full trie before inserting it.

        match self.0.get(&c) {
            None => {
                let mut newtrie = Node::new();
                self.0.insert(c, Box::new(RefCell::new(newtrie)));
            Some(node) => {

One tricky feature of Boggle™ is that we want to end a search for a word early if the “word” found isn’t really a word. That is, if on the board you find the sequence “tk”, no word starts with “tk” and you want the search to terminate early. But as in our example above, “aun” is also not a word, but we do not want to terminate the search, as there may be a handy ‘t’ nearby to finish the word.

So we want to be able to search the trie, but we have two different criteria: “is this a word” and “could this be the prefix of a word?” We want our search engine to be able to handle both.

How do we handle both? Let’s go back: what are the failure conditions? The trie gets exhausted or the string gets exhausted. If both are exhausted at the same time and we’re on a terminator, it’s a word. If the word is exhausted but the trie is not, this is a prefix, regardless of its terminator status.

So, our search feature will be a recursive function that, letter by letter, transits the trie node by node. If the word exhausts and we’re still in the trie, we check the end state (always true for a prefix, is a terminator for a word), otherwise we recurse down to the next letter and the next node.

First, let’s see what we do if we run out of word. For the endstate, we’re going to pass it a function that says what to do when we run out of word:

    fn search(&self, word: &mut Iterator<Item = char>, endstate: &Fn(&Node) -> bool) -> bool {
        let c = match word.next() {
            None => return endstate(self),
            Some(c) => c,

Note that it’s not pub! This is the search function, but it’s going to be invoked by the functions that make the distinction between finding a word and affirming a prefix. Flags are a code smell, and to the extent that you use them, they should never be accessible to client code.  Search, by the way, is completely immutable with respect to the structure being searched; only the word is mutating as we iterate through it, and the per-search recursion is wholly stacked based.  Once built, the trie could be safely used by multiple threads without the need for locking mechanisms.

Finally, if the trie is not exhausted, we try to get the child node, passing the endstate handler forward:

        match self.0.get(&c) {
            None => false,
            Some(n) => n.borrow().search(word, endstate),

And the two functions, find and prefix:

    pub fn find(&self, word: &mut Iterator<Item = char>) -> bool {
        self.search(word, &|s| s.1)

    pub fn is_prefix(&self, word: &mut Iterator<Item = char>) -> bool {
        self.search(word, &|_s| true)

And finally, the constructor. We create a new, empty node, and then we insert words into the trie, using that node as the root of the dictionary. The root node is never a terminator.

    pub fn new() -> Node {
        Node(HashMap::new(), false)

And that’s it. That’s the whole of a trie, just two functions: a builder and a searcher. I had one special need (whole word vs prefix), but that was handled by using a distinguishing function as a flag on the semantics of the search.

Tries are a fun data structure. This version is pretty memory-heavy; it might be possible, for those cases where a word prefix has only one child, to pack them into a denser structure. The cost of running the discriminator versus a win on space density might even be worth it.

But if you have a finite dictionary, even one as big as the Scrabble™ legal words list (178,960 words as of 2012), a trie is the fastest data structure for determining if a string of letters is a real word, or possibly the prefix of a real word.

All of this code is available on my Github, and licensed under the Mozilla Public License v. 2.0.  The code on Github is a little more generic, and will work with both char and byte strings.


“But what are they for?”

Posted by Elf Sternberg as Uncategorized

“But what are they for?”

Do you remember the days in pre-college when you were taking geometry, or algebra, or calculus, and wondering when in your long and varied life you would ever have a need to use the quadratic equation? I remember having that thought lots of time. I had that thought once more, the other day, when I stumbled across Julie Moronoki’s A Brief Guide to a Few Algebraic Structures.

The last six months of my life have been deeply engrossed in regular expression theory, and so it blew my mind a little when I stumbled upon this table:

Algebra Logic Types Regular Expressions
a + b a ∨ b Either a b (a|b)
a ⨯ b a ∧ b (a, b) ab
ba a ⇒ b a -> b ba
a = b a ⇔ b isomorphism a ≅ b
1 Unit ε
0 Void

This table is taken from Thinking with Types. That fourth column, “regular expressions,” isn’t in the original table, but it fits fairly well right there. The first two columns are known as the Curry-Howard Isomorphism, and is essentially a proof, in the mathematical sense, that every computer program is a mathematical equation and vice versa. Since every regular expression is a computer program and a mathematical equation, of course it fits in the Curry-Howard Isomorphism, but it does so with surprisingly direct elegance, and since type systems and regular expressions are set membership assertions with the exact same shape, every regular expression solver is a type system solver, and vice-versa.

This “algebraic structure” is known as a “semiring with closure,” and I’ve been head deep into regular expressions as semirings for the past couple of months.

Here’s the thing, though: I still wonder “What the hell are these things for?” I mean, I started to get part of it: if it fits into a semiring, you can replace it for a regular expression.

In fact, that’s what regexs do: as the regex consumes a string, it replaces that point in the regex and the string with a true/false value (see the “Logic” column above): “keep going” or “fail”. If you reach the end with “keep going” active, the you found the string you were looking for.

If instead of “keep going,” what if you (a) collected the letter being analyzed, and (b) did something smart with it, like, had a set of empty matches and then, for the | symbol, collected all the matches that passed, and for one expression following another, created the cartesian product set of matches. When you reached the end of the string, you’d either have an empty set (“fail”), or you’d have a set with all the legitimate matches — and that’s how capture groups work in classic regular expressions.

Categorically, we say that a regular expression is a function that takes a string and a semiring and returns a new semiring. The set of the semiring may contain a boolean about recognizing the string, or it may contain the number of valid paths through the expression for that string, or it may contain the set of strings found by that expression within the string. That’s it. That’s the whole story.

Great. But what are semirings for?

This, I think, is where mathematics breaks down for most people. It’s not enough to know “semirings” (or semigroups, or functors, or monads); you have to know not just what they are, but how they interact with other mathematical structures to produce results. It also requires a certain degree of cleverness, to be able to take those disparate elements and have that sudden insight.

Hao Huang recently did that with his Sensitivity Proof, in which he used an insight from a completely different section of mathematics to prove how sensitive a function might be, that is, how much change you would have to do to the inputs of a function in order to change the output. The problem has been well-known for 30 years, and Huang’s proof is considered ridiculously elegant, the hypothesis of which fits into a friggin’ tweet, and the total paper is only two pages long.

This is part of my love/hate relationship with Haskell. Haskell forces you to be a better programmer. You don’t have a choice. But at the same time, there needs to be a better way of explaining: “This is what this is for.”

Tech Republic has an article entitled Top priorities for professional developers: Why devs are always looking to learn new programming languages. It’s not until the bottom of the article that we get to any mention of programming languages versus programming skills in general. General skills are about things other than languages: Kubernetes and AWS are configuration skills, not languages. React is a skill and not a programming language. Using an IDE effectively is a skill and not a language.

"Opportunities for professional growth" and "Developing my skills as an engineer" are what are listed as the "top priorities," which includes but doesn’t specify programming languages.

Nonetheless, I think that there’s an important kernel of truth to the idea that it’s languages that developers want to learn, because programming languages are the only things that teach you how to think better.

Underlying React are two concepts. The first is the zipper. Zippers are fascinating and tie in, ultimately, to type systems, advanced data structures, and "regular languages," one of my areas of expertise. Zippers are a damn complex data structure best explored in one of two languages: Scheme, or Haskell. You can’t understand zippers until you’ve mastered one of those two languages, and understanding both allows you to understand both the mechanistic (Turing-machine) and theoretic (Curry-mathematic) interpretation and implementation of zippers. The other is the continuation, which is the fundamental data structure that allows programs to handle decision making. All decision making in a program, every if statement, every exception, every loop, is in some sense an implementation of a continuation. Scheme, again, is the fundamental programming language necessary to understand continuations. React uses both zippers and continuations, but as a developer you’d never need to know or learn about them.

But if you read enough Haskell and enough Scheme, you will. And that’s why, although programmers have to want to learn how to configure cloud deployments and networking and so forth, that’s just plumbing. Learning a new programming language gives your brain the tools necessary to do the exciting new stuff.

I don’t know a single Haskell developer who doesn’t believe the weak version of the Sapir-Whorf Hypothesis.

I once read a heartbraking story about a young woman who finished her MFA (Master of Fine Arts) degree in classical composition by listening to her masters thesis work, a full-length 40-minute symphony, played by a professional orchestra. It was the high point of her life at that point.

In her essay, she wrote about the profound sadness she felt that she’d never hear it played again onstage. She’d probably never write another one, and even if she did the only way she’d hear it is through a synthesizer, in the privacy of her own home. Perhaps shared on SoundCloud or YouTube or even Spotify.

But she’d never hear it again the way orchestral music is meant to be heard. She was a scholarship student; she didn’t have rich parents who could afford to put her onto the international composition circuit, sponsor her to expensive music festivals, and get her in front of critical symphony directors.

Software development is undergoing an equally problematic phase change. As a programming language dilettante enthusiant, I have the luxury of sitting around all day, trying out my ideas in Haskell or Racket, and then implementing them in some language that’s more tractable for others, like Rust or C. And that’s pretty much been true since P-system Pascal was released in the late 1970s. All you needed was a computer. Just one. The one on your desktop.

But today, if you want to do anything substantial in some of the fundamental up-and-coming fields, such as artificial intelligence, data science, or cloud computing, you need far more money. You need to be able to rent several farms worth of compute space, and if you’re doing anything at all interesting with AI, the cost of a single experiment could easily run into the hundreds, if not thousands of dollars. Nobody except the wealthy can afford to do that, or let their kids specialize in it.

Companies looking for that kind of experience just can’t get it off the street. The only way startups get people with cloud experience is by tempting someone away from an established company that put the time and effort into training someone. It’s not available in schools, and you’re certainly not going to be self-taught in systems reliability engineering or Kubernetes cloud deployments. Those skills come only through spending time with very large systems while someone else foots the bill.

What this means for the future of software development is anyone’s guess, but I’m starting to see already an awareness that there needs to be some kind of revamping of the educational pipeline to enable people to get these skills without having wealthy sponsors.


The Sterling Bullet

Posted by Elf Sternberg as Uncategorized

One of the best books on software development ever written is the 1986 No Silver Bullet. Brooks’s central argument is that software development in a team requires the constant, daily transfer of active knowledge of progress, and the cost overhead of this transfer exceeds the value of additional developers fairly quickly; adding more people to a failing software project will actually slow the project down.

The classic joke around this book is that, if your manager proposes hiring more people to get the job done, you should buy him two copies of the book so he can read it twice as fast.

Alex Mills recently wrote an essay that got my attention (and that of a lot of other developers), in which he crystallized an emotion that a lot of programming language enthusiasts feel: The World Might Be Missing A Programming Language. Mills’s impression is one I used to share. To some degree I still share it, and I still love hacking on programming languages, but Mills’s laundry list of features he wants in his perfect programming language is the wrong list.

Let’s talk about four programming languages that are awesome, first.

Racket (i.e. Lisp)

Lisp is the great granddaddy of all great programming languages. Functional-first, mutation-capable, and possessing the ur-construct of all special cases (exceptions, parallel programming, concurrent programming), call-with-current-continuation, Lisp isn’t really so much a programming language as a language for writing other languages. It’s so malleable and powerful it can do almost anything.

Lisp’s malleability comes at a price: every developer writes it differently. Every lisp quickly becomes a hard-to-read pile of abstractions because every developer’s brain works differently, and Lisp has absolutely no guard rails.

Lisp also has exactly one semantic construct: the parenthetical. Every Lisp function looks exactly like every other function. Only the names changed. It can be hard, in a deeply nested Lisp function, to discover what is doing what to what.


Haskell is the functional programming opposite of Lisp: it has Magical Guardrails. Haskell programs are strongly typed, but they use types not just as validation or as a description of memory, but as actual guidance to the compiler as to how the program should be assembled and run. Haskell is famous for the maxim "If it built, it will run," because types guarantee that the program will do something. It may not always be correct (it’s possible you wrote a plus where you meant minus somewhere, after all), but a huge set of possible errors simply don’t exist in Haskell.

Haskell’s syntax is clear and concise. Almost too concise; it can sometimes not be obvious where functionality is coming from, because the developer’s type-based request can lead to the compiler just going out into the library and, if there’s only one type match, automatically adding it. This is great if you know the library, not so great for beginners.

Haskell also uses some concepts from advanced topics in mathematics. Phrases like "structure-preserving transforms" and "monoids in the category of endofunctors" get thrown around, giving beginners a sense of terror at all the learning that’s about to happen.


Python isn’t either of those language; Python is a dynamically typed language with a strict syntax and a huge standard library. Python is a hit because of that library: it was the first interpreted language to bring massive, available functionality for many of the things programmers did on a daily basis with its initial install. Python’s whitespace-oriented syntax annoyed people when it came out; now it’s considered something of a stroke of genius, as you ought to be indenting your scopes anyway.

Python is easy to learn, easy to teach, and at first glance "does what you mean." Its lack of types can lead to lots of debugging pain and unit testing, but for small programs it’s a wonder of fast productivity.


C gets a deservedly bad rap. It’s a language which pretends to have guardrails, but they’re mostly made of spaghetti. On the other hand, C creates predictable assembly language, and it’s possible to extract every last erg of power out of your CPU using it. But it’s "types" are mostly there to describe the layout of memory rather than securely associate functions with objects, and the semantics of C let you change types on the fly. This can be useful for systems programming, where a blob of memory can be a database structure, and the exact same blob just a stream of bytes to be passed over a network, without having to copy or transform the data, but the human overhead of debugging and the notorious security headaches has the world searching for an alternative.

Mills’ language already exists, sort-of

Every time I read a bullet point in Mills list, I had the reaction, Oh, you want Python, or Oh, you mean Haskell.

When he says "I want a language that makes writing asynchronous code easy," my reaction is: Haskell. Haskell goes the full math and says "If you squint just right, everything looks like everything else." A list is a container for a number of things; an exception is just a container for a thing that might fail; an asynchronous operation is just a container for a thing that will happen eventually. By leveraging this insight, Haskell can glue everything together into "an operation that might deliver a list eventually." But Haskell requires exceptional levels of cleverness to know how to put together its n-dimensional Lego pieces. It’s a great language to learn, and it is general purpose, but it’s hard to master.

He wants a Haskell that doesn’t require all that thinking.

A lot of his requests made me think: Lisp. The closest thing we have is Clojure, a Lisp that runs on the Java VM. It resists the classic Lisp explosion by leveraging the Java Standard Library, thus requiring that every deviation from the stock language converges on "tools that still run on the JVM." It’s a nifty restriction and Rich Hickey’s secret weapon. But it’s still a Lisp.

He wants a Lisp that can’t blow up, but isn’t restricted to a VM-only platform.

Like a lot of developers, I gripe a lot about the languages I know best. I complain about Python, Javascript, and Go because I know those languages, and am good at them. I love Haskell and Lisp and Rust, but I also know where those warts are and, unlike the first three, I’ve never been asked to work in them professionally; I suppose if I were using them for something other than recreational purposes I’d get tired of their faults just as quickly.

What Mills (and I, and a lot of other developers) really want is smarter programmers. Self-aware developers who care more abou the code than the usual hired guns we run across in this industry, the fly-by-night, $15/hour code-camp kids who just know how to search Stack Overflow.

Subscribe to Feed



August 2019
« Jul