Pretty much the only way I get things to *stay* in my head is to do them, and then document what I did.

Task one: Doing the Multithreaded Server Exercise from the book.

Yesterday, I ripped through the Rust “Writing a multithreaded web server” exercise, as I don’t have much experience with writing network code with Rust. This is the last exercise in The Rust Book (the one from O’Reilly, and also the one that ships free with the language as a collecition of HTML pages), and it’s a pretty good exercise, but as I was working on it I discovered there are two bugs with the documentation.

The objective of the exercise is to creata simple session handler named handle_connection(), that takes a request and sends a response. The engine is so primitive it can only serve three pages: a hello world page, a hello world page with an active delay (to simulate slow networking), and a 404 page.

Bug 1: Streams must be shut down

The last line of handle_connection flushes the socket and then the connection ends. Theoretically, this means that the socket should shutdown, but it doesn’t seem to work that way. There were two ways of fixing it.

The first was to call shutdown manually:

use std::net::Shutdown;

fn handle_connection(mut stream: TcpStream) {

...

    stream
        .shutdown(Shutdown::Both)
        .expect("shutdown call failed");

Doing this worked best, as clients with “keepalive” active got the right message and shut the connection gracefully. Without this instruction, the socket would stay open and the server would wait for a timeout.

The second was to move the stream. In main, change the execute line to:

    pool.execute(move || handle_connection(stream));

This would force the stream to die when the handle function terminated, but it would result in a “Error: Connection reset by peer” message from older W3C HTTP libraries (like thosed use in the console applications links). Only by closing the socket gracefully could both problems be eliminated.

Bug 2: Ergonomics marches on!

This is less of a bug than it is an aesthetic point. I’m currently running the latest and greatest of Rust stable: 1.37. The book describes a trait called FnBox that works around an ownership issue:

trait FnBox {
    fn call_box(self: Box<Self>);
}

impl<F: FnOnce()> FnBox for F {
    fn call_box(self: Box<F>) {
        (*self)()
    }
}
...
    job.call_box();

This method is intended to replace this syntax, which prior to Rust 1.37 threw a compile-time error: (*job)(). Here, job is our closure produced by the pool.execute function above. The problem is that Rust doesn’t know how big job will be and so can’t allocate stack space for it. By using this trick, we tell Rust that we’re explicitly and finally taking ownership of the FnOnce() object in order to run it once.

This is no longer necessary. As of Rust 1.37, you can throw away the entire FnBox shenanigans and replace it with:

    job();

Task two: Draw the rest of the m11g* owl.


PingCAP has a wonderful tutorial designed to teach new programmers how to program in Rust. Starting from cargo new, you end with an implementation of a high-performance, caching key-value storage server and a client to go with it.

Their tutorial is very sparse. You go from having almost nothing to having a working k/v store very quickly. I implemented the cheapest in-memory one-shot variant I could to get through the first lesson and learned a ton of things about the command line management library, clap], how to use the #![deny(missing_docs)] syntax to guide my documentation efforts, and how to use the failure crate.

I’m afraid, however, that it’s a bit more than I can chew, and I may be putting the other lessons aside, in order to start wrapping up some other projects that I’d like to concentrate on.

Task three: The Internet is for Porn

I make no secret of the fact that I’m as fond of the smut as any other human being. I joke that “porn, video games, and anime” are the foundation of my career. With Tumblr gone, I’ve been trying to cobble together a couple of other streams to cater to my interests, and I just find it interesting that my toolset is pretty much as diverse as my tastes.

I won’t post the results here as they’re very idiosyncratic. I wrote my cataloging tool for Pixiv in Perl, the one for Twitter in Hy, the archiving tool for Tumblr in straight Python, and some rather smaller and more obscure ones in Bash. I haven’t written Perl in a while, and it was an interesting struggle but not a difficult one. Perl, being the first second language I was paid to work in professionally (truthfully: the first was COBOL w/DB2, but don’t tell anyone!) and the language that put food on the family for the first seven years of that professional life, is probably embedded in my DNA like a tick, never to be fully erased.


* The words “internationalization” and “localization,” and now “Kubernetes” are all commonly shortened to “i18n”, “l10n”, and “k8s”, where the number in the middle indicates the number of letters elided. You’re welcome to figure out what 11 letters go between ‘m11g’ on your own.

In my quest to become the world’s greatest programmer, I have been informed that I must reboot my React and Django skills. What better way to make that happen than to port my Boggle solver program to the web?

It’s not good enough to shell out to boggle-solve and just make things happen. I wanted something that required a little more learning. And therefore, we come to the next step in the evolution of the boggle solving library: A C foreign function interface.

The good part of this story is that we’re still going to be writing almost entirely in Rust. As long as we stay away from asynchronous Rust, the Rust runtime can sit comfortably next to the C runtime without a headache. (And yes, C has a runtime, but there are historical reasons for why that runtime is provided by the OS rather than by the compiler.)

There is one catch. The existing boggle solver takes a dictionary and at least one board. But we can solve many boards with the same dictionary. So we must answer this question: Do we want the dictionary to be reusable? Loading the dictionary takes a large amount of the overall processing time and being able to load the dictionary once and re-use it would be worthwhile. The dictionary, however, is a complex Rust structure with a type paramater.

We could answer the question with ‘No.’ In which case our function takes a path to a dictionary, and a memory blob that represents the board, and returns another memory blob representing the answers.

That is sad. We don’t do sad things. Yes, the dictionary needs to be reusable.

The first thing we’re going to do is lock down the dictionary’s type. Consistently, we’ve been using Node<char>. Node is a node in a trie structure. It has no specialized root node, so there’s no point in having a Trie<char> object. For the FFI, we want Rust to make sure the layout of our container is compatible with C. We’ll use the compiler directive repr(C) to do that, and then contain our root node in a new struct:

#[repr(C)]
pub struct Trie(boggle_solver::trie::Node<char>);

For the C FFI, the Rust compiler directive no_mangle tells Rust not to change the names of the functions in a Rusty way, but to leave them as-is so the C runtime, which is primitive and doesn’t know anything about name mangling, will be able to find them.

We’e about to wander into the world of Rust black magic here. The Rustinomican, the book about Rust’s darker world, has dire warnings about what I’m about to show you. This is C-land, where C-things happen and C-corruption is the order of the day.

In the original boggle-solver there is a function, dict(path: &str) -> Node<char> that takes a path to a dictionary file and returns a fully populated trie. We’re going to use that function, wrap the results in an opaque pointer (a pointer the C programmer shouldn’t examine internally), and return that pointer. That pointer has to be taken away from Rust’s type handler. Rust needs to not worry about it, not borrow check it. It is a sinful object lost to the world of C. The Rust function transmute does this:

#[no_mangle]
unsafe extern "C" fn dictionary_make(filepath: *const c_char) -> *const Trie {
    transmute(Box::new(Trie(dict(
        CStr::from_ptr(filepath).to_str().unwrap(),
    ))))
}

This function is marked unsafe, which is fine because only C-side programs will ever call it. Inside, we take the filepath, which is an array of bytes terminated by a null, and cast it to a Rust string, then pass that to our dictionary builder, wrap the newly made dictionary in the new Trie struct, Box it up, and then transmute that box into our C-like pointer. Phew!

Since we are back in the land of C we are responsible for freeing the dictionary when we’re done with it. Miraculously, we can get away with this by using transmute to hand it back to Rust, put it into a scope, and then… just let Rust drop it and recover the memory.

#[no_mangle]
unsafe extern "C" fn dictionary_destroy(trie: *const Trie) {
    let _drop_me: Box<Trie> = transmute(trie);
}

Now that is Rust black magic!

We can now solve a board for a given dictionary. I won’t copy the entire text of the boggle-solve binary here, which makes up the bulk of this function. At the end of that function, we had a vector of all the words found on the board, which was the return type of the function. In our current case, we need to return something C understands.

Traditionally, the way C does this is to allocate a buffer big enough to hold the results, pass that buffer to the function (as a pointer), and then expect the buffer to be full when the results come back.

Here, we take the solutions: Vec<String> and from it create a massive string of all the results separated by newlines (to make it printable). We cast that string into C, and copy the results into the buffer. This isn’t the world’s most efficient example; at one point, we have three copies of the solution in memory, and the "known highest-scoring board" returns a solution set that’s 4,604 bytes long (including newlines). Using a conservative allocation scheme means that at some point we’re using 24 kilobytes of memory. It’s only for a few seconds until it drops all but the 8KB used for the parent buffer. Even in these days of multi-gigabyte machines, that still seems like a lot:

#[no_mangle]
unsafe extern "C" fn solve_for_dictionary(
    board_text: *const c_char,
    dictionary: *const Trie,
    found_words: *mut c_char,
) {

...

    let mut result = String::new();

    if !solutions.is_empty() {
        for solution in solutions {
            result.push_str(&solution);
            result.push('\n');
        }
    }

    let s = CString::new(result).unwrap();
    libc::strcpy(found_words, s.as_ptr());
}

And finally, we can now solve given the path to the dictionary. All of these functions are C functions with C signatures. If we didn’t call dictionary_destroy() there, the trie would never be recovered, and that would be a memory leak. We’re in C land here, where memory management is once again completely and utterly our responsibility.

#[no_mangle]
unsafe extern "C" fn solve(
    board_text: *const c_char,
    dictionary_path: *const c_char,
    found_words: *mut c_char,
) {
    let trie = dictionary_make(dictionary_path);
    solve_for_dictionary(board_text, trie, found_words);
    dictionary_destroy(trie);
}

To make this work with C, we need to provide a C-style header file that tells the C compiler the names and signatures of the objects we’ve created. After all the no_mangle and our repr opaque structure, here’s everything C needs to know about:

#include <stdlib.h>

struct Trie;

struct Trie* dictionary_make(char* filepath);
void dictionary_destroy(struct Trie* trie);
void solve_for_dictionary(char* board_text, struct Trie* dictionary, char* buffer);
void solve(char* board_text, char* dictionary_filepath, char* buffer);

A quick (and very no-frills; I’ve removed ALL the error-handling regarding arguments and board reading) example of using this in C:

#include "boggle_solver.h"
#include <stdio.h>
#include <string.h>

#define MAXBOARDSIZE 64
#define MAXRETURNSIZE 8192

int main (int argc, char** argv) {
  char board[MAXBOARDSIZE];
  char buffer[MAXRETURNSIZE];
  FILE *fp;

  fp = fopen(argv[1], "r");
  size_t len = fread(board, sizeof(char), MAXBOARDSIZE, fp);
  board[len++] = '\0'; /* Just to be safe. */
  fclose(fp);

  solve(board, "/usr/share/dict/words", buffer);
  printf("%s\n", buffer);
  return 0;
}

To use this code, save this in a file name csolve.c in the root directory of the FFI library. We can then build the library and the binary with cargo and GCC:

cargo build --release
gcc ./csolve ./csolve.c -Isrc  -L. -l:target/release/libboggle_solver.so

We can then run this code against the example board:

$ ./example/csolve ../sample_board.txt
aerie
air
ape
...
wore
wren
wrier

So now we have boggle-solver, the C-land version. The binary isn’t even very large (about 8K on my Linux box), but everything is dynamically linked so, in the end, it’s a big executable, hauling the entire Rust runtime with it.

Still, this is a pretty good demonstration of what’s possible. It’s not a great demonstration; in that case, I’d replace the call to dict with something which could take arrays of strings, or maybe even a file pointer, and do magic with it inside of Rust. But this accomplishes the initial goal: get something that can be run from a C program and run the Boggle Solver.

Next: Go one step further, and make it work in Python.

I have a job interview this week and one thing they wanted me to know was that they were into Uncle Bob, “Clean code,” and test driven development. They were really, really emphatic about Uncle Bob’s “Transformation Priority Premise,” his notion that you start with failing (broken) code that does nothing, then does one thing, then does a couple of things, each time writing a test that’s more and more specific, closer to the full specification of the desired outcome, and in each step making exactly one change to the code, in a prioritized list, to make the test pass.

Bob has a comic in which he illustrates the process. He claims that if you follow the TPP transformations as written you can automatically derive quicksort from a specification about sorting, but if you deviate and choose a later but “easier” transformation you’ll end up with bubble sort, the slowest of all the classic sorting algorithms.

Since I’ve been given a week to cram for this exam, I’ve been doing exactly that and I have had a number of reactions.

The first is that object-oriented programming as it is traditionally pursued in enterprise deployments is one of the greatest embarassements in human history. Looking at the examples, even the idealized ones, there’s so much clutter on the page for error handling that, even with Martin’s dictat that code should follow the “functional cores and imperative wrappers” pattern, and all objects should be as SOLID as possible, object oriened programming is noisy, cluttered, and cognitively overloading.

The second is that so many of the TPP “transformation” just don’t make sense in a functional programming context. For example, Bob says that you should go:

7. variable → array
8. array → collection

I’m just gonna stare at you in Functor. In functional programming, we learn about scalars and arrays, but very soon we also learn that these are specific instances of a simple idea, the value, and to say that one is more complex than another is a category error (in the ontological sense). It might make sense in practice to treat these as cognitively more complex objects, but in the end the object of functional programming is to abstract that all away and concentrate on the transformations of data qua data.

Transformation number 9 is just as curious:

9. statement → recursion 

The higher the number, the less you’re supposed to indulge in it, so let’s talk about this: why “recursion?” I’m a huge fan of recursion and prefer it to for loops and other imperative constructs, but I’m an even bigger fan of map / reduce / filter and the general power of folds to replace primitive recursion in almost every case. There’s an entire programming language built around this idea (Micrsoft’s Bosque).

I’ve spent this weekend in a long plunge back into object-oriented programming The Enterprise Way™, and I’ve came away feeling like not much has changed since the 1990s. The code is still too verbose, with too many error conditions being handled independently of each other instead of categorically, and with too much code being written about different code paths rather than different expressions. Too many developers of the sort Uncle Bob is trying to reach still think of their programs in Tinkertoy terms: fit the pieces together like a machine, stand the thing up, and hope it works.

Boggling 2: Parallelized Boogaloo!

When last we left our heroes, they’d solved Boggle! and done so with amazing speed. The only thing left to do was go faster.

There are two strategies, actually, for going faster. The first, mind-bogglingly enough, is to go the other way with your search space: building a graph of the board, for each word in the dictionary scan to see if that word could be made from the board.

There is a size of board for which this strategy is actually faster than the “search the board and check the word” strategy we’re using, but I’ll tell you this: that board is bigger than 12×12. Even with the two-phase bitmap used for boards bigger than 8×8, a 12×12 board solves in only a second or two.

I’m going to give much of the credit for this astounding speed to The Rust Programming Language itself. The allocator for Rust is wicked clever about allocating memory resources while at the same time preventing a whole host of errors common to C++. Having programmed mostly in Rust for the past nine months (with some Haskell and Racket, and my day-to-day tool of Hy), I think I’d rather be work at a coffee shop than go back to hacking C++ or Java.

Let’s stick with our solution, but let’s multithread it. This article will show you how I multithreaded the Boggle searcher using the low-level Crossbeam toolkit and it’s work queue extension, Deque. Cossbeam is an amazing library that takes Rust’s cleverness and applies it to allocating CPU resources.

First thing: except for the solve() function, nothing else in our program has to change. Everything else is already thread-safe. The board and the dictionary are both read-only: every call to solve launches the recursive solver solution for a specific starting point on the board, with a new, unique Scanner object to store both the successful matches (and prefix matches) and the bitmap to track the path taken.

So the trick is to call solveforpos() in a thread. However, the average consumer laptop has four or eight cores, but the average Boggle™ board has more positions than four or eight. We can’t just launch as many threads as we have positions to check.

It’s also not efficient to say, “If we have four cores, let’s just divvy up the 4×4 board into 4 work groups and give each group to a different thread.” This strategy is known as job batching, but it can lead to inefficiencies. One batch may be all small jobs, whereas another may be all big jobs. The small batches will be done and the cores idling while waiting for the big jobs to finish. We’ll address that later.

Crossbeam provides a mechanism for dealing with this: stealing. A collection of queues is shared among the threads, and when one thread finishes its local queue it can steal a job from another thread’s queue. If that doesn’t work, it can steal from a global queue.

For now, let’s not worry about stealing from peers. Let’s just have a global queue of Jobs to tackle. A Job encodes the starting position on the Board for a given scan and the Scanned object from the previous post that will collect the data. The system is done when the queue of jobs is empty.

How to use Crossbeam-Deque

We’re going to be using crossbeam and crossbeam-deque, Injector is the first object we’re going to be using from crossbeam. Injector is our deque (a first-in, first-out queue), and we’re loading our job descriptions into it. It’s a funny name: it implies that it pushes jobs, but most of the time our worker threads will actually be pulling jobs out of the Injector.

pub fn solve_mt(board: &Board, threads: usize) -> Vec<String> {
    struct Job(isize, isize, Scanned);

    let work = &{
        let mut work: Injector<Job> = Injector::new();
        for x in 0..board.mx {
            for y in 0..board.my {
                work.push(Job(x, y,
                    Scanned::new("".to_string(), Ledger::new(board.mx, board.my)),
                ));
            }
        }
        work
    };

The outer work variable is a reference to the work queue, which itself is now anonymous. This is important because it lets Rust know that work is going to be reaped at the end of this function while still allowing multiple immutable references to work to be used.

Crossbeam has a nifty thread launching system with one powerful abstraction I mentioned earlier: temporal scope. The scope() function allows for structured concurrency, guaranteeing that every thread launched within a scope has terminated before the parent thread is allowed to proceed. Inside a scope we’re free to spawn() as many threads as we like, but they will all end before we leave the scope.

So: we need a place to record the results of any one search, and then we need to launch the search, and at the end we need to collect all the results and return them. We also need to find the job we’re going to be processing, but I’m going to cheat and do that later. Let’s launch:

    let mut solutions: Vec<String> = vec![];
    crossbeam::scope(|spawner| {
        let handles: Vec<ScopedJoinHandle<Vec<String>>> = (0..threads)
            .map(|_| {
                spawner.spawn(move |_| {
                    let mut solutions: Vec<String> = vec![];
                    let mut queue: Worker<Job> = Worker::new_fifo();
                    loop {
                        match find_task(&mut queue, &work) {
                            Some(mut job) => {
                                solveforpos(&board, (job.0, job.1), &mut job.2, &mut solutions);
                            }
                            None => break,
                        }
                    }
                    solutions
                })
            })
            .collect();

A couple of things stand out here.

First, the topmost solutions collection is outside the scope. We don’t want to lose the result of our work when the scope ends and Rust reclaims the memory needed to do our processing. The temporary data generated during a search is ephemeral, but our final answers shouldn’t be.

Second, the handles collection is where our data actually ends up, and we need to extract the results of a thread from it. It’s a fairly complex type signature, but straightforward once you figure it out.

Third, we’ve got that Worker thing, which is also from crossbeam. Crossbeam-deque has its ‘job stealing’ mechanism, in which all threads in a scope can have multiple job queues and ‘steal’ work from each other (usually half the items in a queue) as each thread finishes its own queue.

We’re not going to use this feature. We’re just going to have a single scoped queue in Injector and pop jobs off. But crossbeam wants each thread to have a queue, so we’re going to have it, but at run-time that queue will have only one job in it, the current job. When a thread has completed one full search it will loop around and ask the Injector, for another job.

Fourth, we have another solutions vector of strings; this is the one that goes into the vector of ScopedJoinHandles, and in the end we have to merge and flatten them all.

And finally, a pro-tip: see that .collect() at the bottom? I used Range.map() to launch my threads, rather than a for loop. And .map() is lazy: it produces exactly one iteration, waits for it to complete, and then goes on to produce the next. Without .collect(), the spawner produces one child thread which does all the work… and then we’re actually a little slower than the the single-threaded version due to the overhead. Using .collect() there forces .map() to be eager, to try and collect all the threads at once and put their results into handles immediately.

All right, we’ve run the threads and collected the results. What’s next? Extraction:

        solutions = handles
            .into_iter()
            .map(|handle| handle.join().unwrap())
            .flatten()
            .collect()
    })
    .unwrap();

    solutions.sort();
    solutions.dedup();
    solutions
}

In the first line, when the parent scope hits the first handle.join(), it waits until the first thread launched terminates, and then collects its data. If the second and third thread are also done, it immediately collects those, otherwise it goes back into the waiting state until it’s merged the entire scope down. The .unwrap() is there to handle error conditions of the total concurrency handling process, but I’m going to ignore it for now.

We then sort and de-duplicate our collection and return it.

Much of the point of this exercise is to simplify the queue example from crossbeam. The version of find_task on that page is esoteric and difficult to read. I wanted a simpler version that I could understand. This is what mine looks like:

fn find_task<T>(local: &mut Worker<T>, global: &Injector<T>) -> Option<T> {
    match local.pop() {
        Some(job) => Some(job),
        None => loop {
            match global.steal() {
                Steal::Success(job) => break Some(job),
                Steal::Empty => break None,
                Steal::Retry => {}
            }
        },
    }
}

A thread looking for work passes in both its local and the global queues. We attempt to pop the local job off, which should never exist; it’s just an empty container needed to pass this assertion. We then attempt to steal another job from the global queue. If we’re out of jobs, we return None. If we find a job, we return Some(job). And finally, if the global queue was locked (due to another thread being in the middle of getting work), we tell find_task to busy-wait and try again.

That’s it. That’s the whole of what that big mess on the crossbeam-deque page is trying to tell you, only using iter::repeat instead of loop and throwing in a whole bunch of stuff about stealing jobs from other threads. Which you will need someday if you write lots of concurrent code. For learning, however, you will not want it; you want the simplest level of code you can hold in your head. A loop does that for me.

And that’s it. Your Boggle solver is now multi-threaded. And the results are impressive: With four threads, it’s now about three times faster, solving the board in about 350ns. It’s not four times faster; there’s allocation and processing overhead involved in separating out the jobs. But it’s three times faster, which is a great improvement.

As always, the entire codebase is avaiable from the the boggle-solver project on my Github.

All right, so now I’ve got a bitmap for tracking the path I’ve taken on the board, and I’ve got a trie that can determine whether or not a sequence of letters is a word (or the prefix of a word) from a specified dictionary.

Let’s solve Boggle™.

A Boggle board is a 4×4 grid where each cell is filled with one of sixteen dice. Each die has a letter.  Our objective is to scan the board finding words without ever using the same die twice for a singe word.  The typical strategy is to start at a letter and scan its neighbors for patterns that might register in our brains as “a word.” We’ll use the same technique in our algorithm:

  1. Start with a letter
  2. Scan one neighbor, creating a two-letter word
  3. If that word is a whole word, add it to the solutions list
  4. If that word is a prefix of a word in the dictionary
    1. Pick a neighbor that we have not visited this scan
    2. If there is one, make a three-letter word, and recurse to 3, and so on.
  5. Terminate when the “word” is not a prefix of any word or there are no valid neighbors to visit

So we need a structure to contain the word we’ve scanned, and the Ledger. Let’s call it Scanned. The only special thing we need for Scanned is that for every step, it needs to clone the Ledger; that is, we get a new copy of Scanned for each neighboring letter, and recurse down that path with that assembled word. When the search terminates (due to exhaustion of the search space or the failure of the prefix), we will then use the copy of Scanned at this point in the stack to create a new search with the next neighbor, and so forth. This is a standard recursive strategy.

The result looks like this:

pub fn solve(&mut board) -> Vec<String> {
  let solutions = Vec<string>
  for x in 0..mx {
    for y in 0..my {
      let mut possibles = Scanned::new("".to_string(), Vec::new());
      solveforpos(board, x, y, &mut possibles, &mut solutions);
    }
  }
  solutions.sort();
  solutions.dedup();
  solutions.to_vec()
}

Step 2. is solveforpos(), where we implement the recursive strategy. But for one technicality, we could have implemented this as a single function, but for one little detail: The letter ‘Q’. There are special rules about Q, and we have to support them, as the kids say, “because English.”

pub(in crate) fn solveforpos(
  board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>)
) {
  let c = board.board[x as usize][y as usize];
  innersolveforpos(c, board, (x, y), curr, solutions, false);
  if c == 'q' {
    innersolveforpos('u', board, (x, y), curr, solutions, true);
  }
}

The innersolveforpos() function checks for word validity, prefix validity, and position validity. Not shown here (but you can find it in the source), the Scanned object actually has the responsibility for adding the letter if the position is valid, and returning a new, longer “maybe-word” (remember, we have to support backtracking in our recursion) and a new, updated path Ledger. So that’s where we pass the “skip position check” flag, which in turn lets us put both “Q” (so we catch words like ‘sheqel’ and ‘burqa’) and “Qu” letters into our candidate string.

Look above, and you’ll see that we add ‘qu’ blindly whenever we encounter ‘q’. This is important. We have to let that happen because we need to continue even if “qe” and “qa” aren’t in the candidate list. “Quota” is a real word.

Once we’ve added the letter(s) and determined that the string we have is the prefix of a word found in the dictionary, we then scan the neighbors and recurse, skipping the current cube. The Ledger makes sure that we don’t re-check a letter for a given single search, but by cloning the letter and the candidate we also ensure that the backtracking is done correctly.

fn innersolveforpos(c: char, board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>, skip_pos_check: bool
) {
  match curr.add(c, (x, y), skip_pos_check) {
    None => return,
    Some(mut newcurr) => {
      if newcurr.0.len() > 2 && board.words.find(&mut newcurr.0.chars()) {
        solutions.push(newcurr.0.to_string());
      }
      if !board.words.pref(&mut newcurr.0.chars()) {
        return;
      }

      for i in -1..=1 {
        for j in -1..=1 {
          if !(i == 0 && j == 0) {
            // Skip the current block!
            let (nx, ny): (isize, isize) = (x as isize + i, y as isize + j);
            if nx >= 0 && nx < board.mx && ny >= 0 && ny < board.my {
              solveforpos(board, (nx, ny), &mut newcurr, solutions)
            }
          }
        }
      }
    }
  }
}

And that’s pretty much how you solve Boggle. According to one source, the total number of Boggle boards out there in (nxm)! (that’s the factorial symbol there), or for 4×4 board, 16!, or it would take 20,922,789,888,000 visits to do absolutely every search of the board. Except for one thing: the English language is not random! It’s messy, but not random. The fact that many letter combinations cannot actually lead to a real word found in the candidate dictionary means that the vast majority of searches terminate early.

On my laptop, a 4×4 board with all ‘e’s and a dictionary of ‘eee’ through ‘eeeeeeeeeeeeeeee’ takes 5 minutes and 45 seconds to complete. But in practice, the average runtime of a boggle board with this algorithm is barely 1.5 milliseconds.

Which is not too damn bad at all.

Can we go faster? Yes we can.

One of the things I needed in my Boggle solver was a bitmap. The average Boggle! board is 4×4, or 16 squares, and one of the rules is that you may not use the same letter twice. I needed a data structure that would handle the "Have I seen this before?"

Now, for a 4×4 board, a 16-bit integer will work just fine. And, in fact, the Ledger structure in the solver is, by default, a 64-bit integer, capable of handling boards as big as 8×8. In those case, the "mark" and "check" operations are just indices. Since this is a two-dimensional structure, we need a way to convert two-dimensional constructs into a one-dimensional number.

We want to move forward through the bitmap, using the width of the bitmap to find which row we’re in, and then the position on that row as the final offset.

These functions use the Rust bitmap operators. << means "shift left", and for a lot of programming operations these are used to mean "multiply by two." An accompanying function >> means "shift right", and can mean "divide by two"; in assembly language, both of these functions have an associated register that will tell you if a zero or one "fell off", and there are are similar instructions that will actually "shift around"; if a ‘1’ falls off at the 0th position of a 64-bit register, a ‘1’ will be inserted into the 63rd position, and vice-versa.

For our purposes, though, we just care about "the bit at a given position," and we use the shift operator to shift that bit into that position. In this case, we just create a new u64 bitmap with that one bit set:

pub struct Ledger(isize, isize, u64);

impl Ledger {
    fn point(&self, x: isize, y: isize) -> u64 {
        1 << (self.1 * x + y)
    }
}

And then marking the bitmap consists of ‘or’ing it against the representative bitmap.

   fn mark(&self, x: isize, y: isize) {
       self.2 |= self.point(x, y);
   }

And checking the bitmap is to ‘and’ the pointed bitmap against the internal bitmap and checking that the result is not zero (a very fast operation in assembly language).

   fn check(&self, x: isize, y: isize) -> bool {
       self.2 &= self.point(x, y) != 0
   }

As trivial as that is, what if the board is bigger than 8×8? At that point, we’re beyond the largest integer provided by mainstream CPUs, so we have to move to something simple: a map of maps. We’ll use a Vec<u8>, and kinda do what point() does up above, only in reverse: turn a single coordinate into a coordinate pair indicating which index in the vector to target, and then which bit in that u8 we want to set or change.

The only challenge here is that we need to know how big the vector will be ahead of time, and we need to ensure that the vector is pre-populated and that the entire bitmap starts out set to zero. In the event that our size isn’t evenly divisible by eight, we need one more block of bits for the overflow:

pub struct Bitmap(Vec<u8>, usize);

impl Bitmap{
  pub fn new(b: usize) -> FSBitmap {
        let s = b / 8 + { if (b % 8) == 0 { 0 } else { 1 } };
        FSBitmap((0..s).map(|_| 0 as u8).collect::<Vec<u8>>(), b)
  }
}

The index of a point is then two numbers: which byte we want to examine, and then an offset into that byte. In many ways, it’s exactly what the Ledger does, only backward. The Ledger doesn’t care about machine sizes, just the board. The bitmap, though, cares about machine sizes.

  fn index(&self, p: usize) -> (usize, usize) {
    (p / 8, p % 8)
  }

This format isn’t usable as a number, but as a way of tracking "in a modestly large atomic space, which units are in a which binary state?" it’s perfect. Marking a bit uses this pair:

  pub fn mark(&mut self, p: usize) {
    let (cell, byte) = self.index(p);
    self.0[cell] |= 1 << byte;
  }

As does checking it. In this particular case, any bit referenced outside the size of the requested original size is assumed to be zero:

  pub fn check(&self, p: usize) -> bool {
    if p >= self.1 { return false; }
    let (cell, byte) = self.index(p);
    self.0[cell] & (1 << byte) != 0
 }

Other operations such as flip and unmark can be written with just these operators.

Bitmaps aren’t the most thrilling data structures in the world, although there are some significant uses for very large, sparse bitmaps that have to be stored in something more esoteric than just a Vec<u8>. But for the purposes of Boggle! this was straightforward and not too difficult.

There was a research project named Wyvern (not related to the current programming language Wyvern) that was intended to invent what would probably be called a "lexically scoped paradigmatic programming language."

Programming languages are built on paradigms. In programming language theory, a "paradigm" is a set of ideas that form a mutually exclusive way of expressing the meaning of a program. The imperative paradigm treats programming mechanistically: "do this, then do this, then do this"; the functional paradigm treats programming mathematically: "if you have these different transformations of the data, eventually you get the answer you want."

The notion of lexically scoped paradigmatic programming is back, and I still believe it’s a bad idea. Jinnzest writes in favor of the idea: "[I]f a language, on the contrary, provides means to split code into separate paradigm we will have to examine only one of them at once abstracting away all other."

The idea of lexically scoped paradigmatic programming is simple: using some kind of syntax, we tell the compiler "Okay, now we’re gonna go all-functional, but we’re gonna glue it together with imperative," or "Everything in here should be assumed to be async and event-driven," and so on.

… and we’re back to Lisp. Lisp in Small Pieces, specifically. (Seriously, anyone who is working with a programming language who has not worked their way through LiSP is missing out.) We’re back to a world where reading someone else code is basically a "guess what’s going on here" and "your idea of the environment, stack, and so on are laughable." Lisp is a great language, but only if your people are super-disciplined about using it one way and one way only; otherwise you end up with all the paradigms all mixed up in a bucket.

This is one of the reasons I recommend Go for a commercial venture that’s trying to move away from Python and Node. Go has only one paradigm: imperative. It’s not even a very good event-driven language; goroutines and message passing devolve into imperative code very easily.

On the other hand, it’s a vile, ugly language, and that’s why I like to write in Haskell for learning, Rust for performance and Lisp for elegance. So, you know, not in production.

05Aug

Data Structures: Tries in Rust

Posted by Elf Sternberg as Uncategorized

I recently went to a job interview (I’m looking for work, by the way; hire me!) and one of the questions put to me was “solve the Boggle™ board.” I don’t think I did very well on the question and it wasn’t until I got home that I remembered “Dammit, the fastest structure for checking if a word is in a dictionary is a trie!”

Whiteboard exercises are terrible. I don’t think I’ve ever actually programmed a trie, not even in Data Structures class I took way back in college in 19mumblemumble. I’m sure I’ve used them, but I’ve never had to write one.

So what is a trie?

A trie is a data structure that encodes every word in a dictionary as a string of nodes, one per letter. This trie encodes the dictionary containing the words “a”, “an,” “ant”, “art”, and “aunt”. If we’re looking up “ant,” we start at the root node and traverse to “a”, then “n”, and then “t”. The “t” is double-circled, that is, it’s marked as a terminal node, and thus “ant” is in this dictionary. If we were looking up the word “any”, we would get to “a,n,” and then fail, because “any” is not in this dictionary. Likewise, we might ask if “aun” is in the dictionary, and we would get to “a,u,n,” but that “n” is not double-circled, and thus “aun” is also not a complete word in this dictionary.

Like a regular expression, a trie search is successful when the string is exhausted, the trie is still active, and we find ourselves on a node marked as a word terminator. If the trie exhausts before the string, or if the string exhausts on a non-terminating node, the search fails to find a complete word.

The rules of Boggle™ are very simple: on a 4×4 grid, sixteen dice are shaken and dropped into place. Each die has six letters on it, probabilistically arranged. Players have three minutes to find every valid word on the board, where “valid” means “found in the dictionary the players have agreed to use,” modulo some rules like “no contractions or abbreviations.” The letters of the words must be adjacent in any of the eight cardinal directions (modulo the borders of the grid), and no letter may be used twice for the same word.

So, to solve for Boggle, we need a fast lookup dictionary, and thus we need a trie. Tries are extremely fast, Ο(n) where n is the length of the word. They’re memory-intensive, but these days memory is cheaper than time. Each node has two conditions: the list of letters that it leads to, and whether or not it is a terminal node. For the “list of letters,” we’re going to use a HashMap. The child nodes must be Boxed as we’re building this on the heap (it will be huge for a full-sized dictionary) and it must be RefCell‘d because we’ll be mutating child nodes repeatedly as we build the trie.

For a canonical dictionary lookup, this implementation is different the Wikipedia page’s description.  There’s no utility to storing the node’s value in itself; we know what the value is by the key we used to reach it!  We do need to know if the node is a terminator node.

There are strong feelings about tuples vs structs for values with multiple fields, with most people preferring structs.  For a value with two fields, I feel a tuple ought to be adequate.  A record with so many fields it has to name them, and therefore a struct, has a code smell and is worth examining closely for cognitive complexity.  I’ll address that in the next post.

pub struct Node(HashMap<char, Box<RefCell<Node>>, bool);

For this structure, we’re going to build it imperatively; that is, each word will be presented to the structure one at a time, and each word will be an iterator over its characters. The first thing we need to do is say, for a given node, if the word is exhausted, then we are a terminator node:

impl Node {
    pub fn insert(&mut self, word: &mut Iterator<Item = char>) {
        let c = match word.next() {
            None => {
                self.1 = true;
                return;
            }
            Some(c) => c,
        };

Note that we’re returning from this function if we’re a terminator. There is no character to insert into the next node.

If we’re not a terminator, we must then either access or create a child node, and then keep traversing the word until its exhausted. In the “create a new child” scenario, we create the full trie before inserting it.

        match self.0.get(&c) {
            None => {
                let mut newtrie = Node::new();
                newtrie.insert(word);
                self.0.insert(c, Box::new(RefCell::new(newtrie)));
            }
            Some(node) => {
                node.borrow_mut().insert(word);
            }
        };
    }

One tricky feature of Boggle™ is that we want to end a search for a word early if the “word” found isn’t really a word. That is, if on the board you find the sequence “tk”, no word starts with “tk” and you want the search to terminate early. But as in our example above, “aun” is also not a word, but we do not want to terminate the search, as there may be a handy ‘t’ nearby to finish the word.

So we want to be able to search the trie, but we have two different criteria: “is this a word” and “could this be the prefix of a word?” We want our search engine to be able to handle both.

How do we handle both? Let’s go back: what are the failure conditions? The trie gets exhausted or the string gets exhausted. If both are exhausted at the same time and we’re on a terminator, it’s a word. If the word is exhausted but the trie is not, this is a prefix, regardless of its terminator status.

So, our search feature will be a recursive function that, letter by letter, transits the trie node by node. If the word exhausts and we’re still in the trie, we check the end state (always true for a prefix, is a terminator for a word), otherwise we recurse down to the next letter and the next node.

First, let’s see what we do if we run out of word. For the endstate, we’re going to pass it a function that says what to do when we run out of word:

    fn search(&self, word: &mut Iterator<Item = char>, endstate: &Fn(&Node) -> bool) -> bool {
        let c = match word.next() {
            None => return endstate(self),
            Some(c) => c,
        };

Note that it’s not pub! This is the search function, but it’s going to be invoked by the functions that make the distinction between finding a word and affirming a prefix. Flags are a code smell, and to the extent that you use them, they should never be accessible to client code.  Search, by the way, is completely immutable with respect to the structure being searched; only the word is mutating as we iterate through it, and the per-search recursion is wholly stacked based.  Once built, the trie could be safely used by multiple threads without the need for locking mechanisms.

Finally, if the trie is not exhausted, we try to get the child node, passing the endstate handler forward:

        match self.0.get(&c) {
            None => false,
            Some(n) => n.borrow().search(word, endstate),
        }
    }

And the two functions, find and prefix:

    pub fn find(&self, word: &mut Iterator<Item = char>) -> bool {
        self.search(word, &|s| s.1)
    }

    pub fn is_prefix(&self, word: &mut Iterator<Item = char>) -> bool {
        self.search(word, &|_s| true)
    }

And finally, the constructor. We create a new, empty node, and then we insert words into the trie, using that node as the root of the dictionary. The root node is never a terminator.

    pub fn new() -> Node {
        Node(HashMap::new(), false)
    }
}

And that’s it. That’s the whole of a trie, just two functions: a builder and a searcher. I had one special need (whole word vs prefix), but that was handled by using a distinguishing function as a flag on the semantics of the search.

Tries are a fun data structure. This version is pretty memory-heavy; it might be possible, for those cases where a word prefix has only one child, to pack them into a denser structure. The cost of running the discriminator versus a win on space density might even be worth it.

But if you have a finite dictionary, even one as big as the Scrabble™ legal words list (178,960 words as of 2012), a trie is the fastest data structure for determining if a string of letters is a real word, or possibly the prefix of a real word.

All of this code is available on my Github, and licensed under the Mozilla Public License v. 2.0.  The code on Github is a little more generic, and will work with both char and byte strings.

02Aug

“But what are they for?”

Posted by Elf Sternberg as Uncategorized

“But what are they for?”

Do you remember the days in pre-college when you were taking geometry, or algebra, or calculus, and wondering when in your long and varied life you would ever have a need to use the quadratic equation? I remember having that thought lots of time. I had that thought once more, the other day, when I stumbled across Julie Moronoki’s A Brief Guide to a Few Algebraic Structures.

The last six months of my life have been deeply engrossed in regular expression theory, and so it blew my mind a little when I stumbled upon this table:

Algebra Logic Types Regular Expressions
a + b a ∨ b Either a b (a|b)
a ⨯ b a ∧ b (a, b) ab
ba a ⇒ b a -> b ba
a = b a ⇔ b isomorphism a ≅ b
1 Unit ε
0 Void

This table is taken from Thinking with Types. That fourth column, “regular expressions,” isn’t in the original table, but it fits fairly well right there. The first two columns are known as the Curry-Howard Isomorphism, and is essentially a proof, in the mathematical sense, that every computer program is a mathematical equation and vice versa. Since every regular expression is a computer program and a mathematical equation, of course it fits in the Curry-Howard Isomorphism, but it does so with surprisingly direct elegance, and since type systems and regular expressions are set membership assertions with the exact same shape, every regular expression solver is a type system solver, and vice-versa.

This “algebraic structure” is known as a “semiring with closure,” and I’ve been head deep into regular expressions as semirings for the past couple of months.

Here’s the thing, though: I still wonder “What the hell are these things for?” I mean, I started to get part of it: if it fits into a semiring, you can replace it for a regular expression.

In fact, that’s what regexs do: as the regex consumes a string, it replaces that point in the regex and the string with a true/false value (see the “Logic” column above): “keep going” or “fail”. If you reach the end with “keep going” active, the you found the string you were looking for.

If instead of “keep going,” what if you (a) collected the letter being analyzed, and (b) did something smart with it, like, had a set of empty matches and then, for the | symbol, collected all the matches that passed, and for one expression following another, created the cartesian product set of matches. When you reached the end of the string, you’d either have an empty set (“fail”), or you’d have a set with all the legitimate matches — and that’s how capture groups work in classic regular expressions.

Categorically, we say that a regular expression is a function that takes a string and a semiring and returns a new semiring. The set of the semiring may contain a boolean about recognizing the string, or it may contain the number of valid paths through the expression for that string, or it may contain the set of strings found by that expression within the string. That’s it. That’s the whole story.

Great. But what are semirings for?

This, I think, is where mathematics breaks down for most people. It’s not enough to know “semirings” (or semigroups, or functors, or monads); you have to know not just what they are, but how they interact with other mathematical structures to produce results. It also requires a certain degree of cleverness, to be able to take those disparate elements and have that sudden insight.

Hao Huang recently did that with his Sensitivity Proof, in which he used an insight from a completely different section of mathematics to prove how sensitive a function might be, that is, how much change you would have to do to the inputs of a function in order to change the output. The problem has been well-known for 30 years, and Huang’s proof is considered ridiculously elegant, the hypothesis of which fits into a friggin’ tweet, and the total paper is only two pages long.

This is part of my love/hate relationship with Haskell. Haskell forces you to be a better programmer. You don’t have a choice. But at the same time, there needs to be a better way of explaining: “This is what this is for.”

Tech Republic has an article entitled Top priorities for professional developers: Why devs are always looking to learn new programming languages. It’s not until the bottom of the article that we get to any mention of programming languages versus programming skills in general. General skills are about things other than languages: Kubernetes and AWS are configuration skills, not languages. React is a skill and not a programming language. Using an IDE effectively is a skill and not a language.

"Opportunities for professional growth" and "Developing my skills as an engineer" are what are listed as the "top priorities," which includes but doesn’t specify programming languages.

Nonetheless, I think that there’s an important kernel of truth to the idea that it’s languages that developers want to learn, because programming languages are the only things that teach you how to think better.

Underlying React are two concepts. The first is the zipper. Zippers are fascinating and tie in, ultimately, to type systems, advanced data structures, and "regular languages," one of my areas of expertise. Zippers are a damn complex data structure best explored in one of two languages: Scheme, or Haskell. You can’t understand zippers until you’ve mastered one of those two languages, and understanding both allows you to understand both the mechanistic (Turing-machine) and theoretic (Curry-mathematic) interpretation and implementation of zippers. The other is the continuation, which is the fundamental data structure that allows programs to handle decision making. All decision making in a program, every if statement, every exception, every loop, is in some sense an implementation of a continuation. Scheme, again, is the fundamental programming language necessary to understand continuations. React uses both zippers and continuations, but as a developer you’d never need to know or learn about them.

But if you read enough Haskell and enough Scheme, you will. And that’s why, although programmers have to want to learn how to configure cloud deployments and networking and so forth, that’s just plumbing. Learning a new programming language gives your brain the tools necessary to do the exciting new stuff.

I don’t know a single Haskell developer who doesn’t believe the weak version of the Sapir-Whorf Hypothesis.

Subscribe to Feed

Categories

Calendar

November 2019
M T W T F S S
« Oct    
 123
45678910
11121314151617
18192021222324
252627282930