Status: Floating Point Escapes Me, and Carving Seams

This week I’ve been involved in two different things: the first is giving up on the Buddhabrot generator.

There were two implementations. The first is the "Naive" implementation, which works okay, but is, well, naive. Ultimately, we want to render an image, and in the naive implementation I assumed (incorrectly) that what I wanted was to find a pixel, map it to a point on the complex plane, iterate that, and then map the result back to the image. The problem is that the pixel plane is too crude to give me all the really useful points. I ended up with a cloud that was recognizably the Buddha, but had significant asymmetries due to the imprecision of the map/map pair.

The second was named the "Cupe" implementation, after Johann "Cupe" Korndoerfer’s lisp implementation. This one was terrible, and I think I’ve reached the point where I’ve stared at it so long the details are starting to gloss over in my head and it’s time to walk away from it for a while and let it rest, then come back with a fresh attempt at an implementation, perhaps some slam of the Naive and the Cupe.

One interesting detail of the Cupe version is that he asserts, and I’m not currently math-savvy enough to know why, that the only interesting "starting points" for Buddha-related orbits are those within a very small distance of the border of the Mandelbrot set, those pixels that live right on the edge of the black zone. I was able to generate a gorgeous image of the border (no, really, it was my best work), but could not turn that information into a working Buddhabrot implementation.

The good news is that for the last two days I’ve been knocking together a basic seam-carving algorithm based on Rust’s image library. I’m trying to keep the algorithm separate from the image processing part of the library, as my hope is to eventually port this to C (yes, sucker: C) and then submit it to NetPNM as an alternative resizing feature. I’ve got the energy processor and seam recognizer worker, now I just need to proof-of-concept an image-to-image resizer.


The other thing that happened this week is that I went to the used bookstore closest to the Microsoft campus. Someone there had just unloaded all of their database development textbooks, and for some reason I’ve been obsessed with them. The ugliest part appears to be the transaction manager, and I’ve been kinda wondering something.

I believe, perhaps irrationally, that at some point every program should be written as a library. I write most of my rust programs that way, as a library with a ‘src/bin’ for the final executable, which is often little more than an implementation of a command line parser and an invocation to one of the library’s main entrance points.

I also believe that many things which are command-line programs ought to be client/server. That is, that the library/executable pattern in which I write my binaries is just a strong variant of a basic client/server model. For example, the Unix commands ‘find’, ‘mlocate’, and ‘gdmap’ are all the same routine under the covers, a file-system tree-walker algorithm ruled by a collection of predicates. So I have basic questions, like "Why is there no library API for mlocate?" and "Why do you have to leave GDMap up and running to get the map?"

So now I intend to try an answer some of those questions. I don’t know if ‘find’, ‘mlocate’, and ‘gdmap’ need to be client-server, but it seems to me that a file-system tree-walking routine into which you inject the expected outcome, or perhaps from which you extract the expected outcome as an iterator, would be an ideal separation of concerns, and then accessing mlocate can be a library, and gdmap won’t need to be running in order to generate the map itself.

I’ve been looking deep into the storage manager for Postgres, and as I’m reading through it, and looking at various announcements of features added and bugs fixed throughout the Postgres ecosystem, I’m seeing two things: there is a lot of Postgres code that basically exists because C cannot make the same promises as Rust, and many big fixes involves touching many different parts of the code. I’m wondering how much smaller some parts of Postgres could be made using the native features of Rust and the better-built libraries around it, and how different its internals would be if, somehow, the relationship between all the different parts could be teased apart.

Like my belief in the client/server-ness (or library/cli-ness) of the programming universe, I’ve long suspected that a lot of programming comes down to just a few moving parts. Go, for example, is a runtime with a fast garbage collector and a smart concurrency model, on top of which is built a syntactically ugly language. There’s no reason it couldn’t be a library and we’d build other things on it, as if it were a VM. A traditional database is a query language with a fast optimizer built on top of a smart manager for moving data from warm storage (disk & spindles). A spreasheet is a functional, reactive programming language with a funky display mechanism.

Anyway, I’m obsessing about transaction managers and storage engines right now. Not sure how that’ll pan out.

At the last interview I did, I was not given a take-home test. Instead, I was sat down with someone to do a pair-programming exercise where the objective was, "Given this collection of unit tests and this mess of a class, refactor it into something more maintainable." The class was in Python.

I quickly identified what the class did. It represented the relationship between a user and their role-based access, and the capabilities of that access, to an unidentified system. Refactoring consisted of identifying every method that had a real-world impact according to the unit tests, throwing everything else away, and then abstracting out the relationship between a role and its capabilities.

Funny thing about the roles: they were static. So I moved the role and capabilities into a table at the top, commented it to say "If you want to add a new role or capability, here is where you do it," and wrote a single function outside the class to represent the role/capability lookup.

The last interview of the cycle was a code review. They were mostly shocked that I had written a function instead of a class to represent the role/capability relationship. I pointed out that the function could be scoped at the module level, that in Python modules were themselves objects, and that this was a simple, static, binary relationship: role:[capability]. That was it.

John Carmack said it best:

Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function.

Indeed it was. A class definition and so forth might have satisfied some arbitrary sense of encapsulation, but it was also clutter. Until and unless they needed a more dynamic system for describing their RBAC, mine was the clearest variant possible. And they agreed.

One asked, "What would you have done differently?"

"If I’d had my own environment, I wouldn’t have been struggling with an unfamiliar editor and a QWERTY keyboard. I could have gone a lot faster then." The lack of a Dvorak option was a serious struggle. And while that may sound facetious, it does point to something more important: how would this interview cycle have handled someone with a disability?

It occurred to me the other day that there’s more than one thing professional programmers can learn from kinky people: we can learn how to ask things of our peers.

The first thing the kink community taught software developers (and other groups) is the code of conduct. Kinky people have been dealing with offensive, intrusive, and disruptive conduct within our public spaces pretty much since the founding of the modern community in the late 1980s. We know how to manage that stuff. The very first codes of conduct for software conferences were written by a kinky person cribbing off an S&M dungeon’s rulesheet, and many of the codes of conduct for all manner of conferences and conventions since have descended from her work.

The other thing kink can teach software developers, the thing they can use every day, is the Consent Communication Pattern. The consent communication pattern is how kinky people ask for specific things. It’s extremely useful; you might even call it “manipulative,” in that you’re much more likely to get what you want if you use it versus the more common ways of approaching someone and making a request.

The consent communication pattern is so useful that my children’s school district teaches it to seventh graders in their sex education curriculum. It’s ridiculously simple, and has nothing to do with sex. Ready?

Concrete context. Concrete request.

That’s it. In sex ed classes, the pattern is used to help people negotiate relationships and specific activities that happen when dating. The examples from my school district’s workbook read:

  • “I really like when we X. Can we do more of that?”
  • “I really don’t like when you X. Could you not do it?”
  • “I really don’t like when you X. Could we do Y instead?”

The working examples are innocuous, things like “Hold hands in public,” or “Call me by than nickname,” or whatever. But you can see where the textbook really wants to go.

The listener to this request is actually free to negotiate. In the latter two examples, the person making the request is setting up a boundary, a limit on what he or she will accept. But in all cases the listener can make specific counter-proposals in order to get his or her desires met.

In my time as an engineer, I’ve been the recipient, and sometimes the deliverer, of an awful lot of bullshit passive-aggressive “asks.” Questions to the team of “What are we doing to hit the deadline?” are in this category; there’s an implicit aggressive “We’re not doing enough, but it’s not my fault we’re not doing enough” in the question. “We don’t know enough to move forward” is in the passive mode, in that the person saying it is implicitly asking the others for help, but hasn’t come out and said so.

In English, we learn about subject-verb-object relationships in sentences. But conversations don’t have subjects; they have topics, and often subtopics. The consent communication pattern establishes a topic, and then makes specific requests of people for information or action about that topic. It works with the human mind’s fast-and-slow thinking patterns, giving the listener time to accept the topic, switch their mind to deal with it, and then drives their ability to respond thoughtfully and coherently.

The user story template is an example of a bullshit ask. “As a [role], I want to [do something] so that [reason/benefit].” It has a template, but it has no concrete request. I know, we’re supposed to assume that the request is, “Can you do that?” But it’s still a bullshit, passive-aggressive request.

The next time you’re at a planning meeting, re-write them in a hyper-concrete form: “The customer needs to be able to do X. Do we know enough about X to talk about a solution?” And then, “I know estimates are only estimates, and these may be harder than they look, but can you give me initial estimates for how long each task will take?” Notice how the pattern sets a concrete topic and then makes a specific request.

One important aspect of the consent communication pattern is its concreteness.  If you can’t establish an specific context and craft a request that’s attainable and relevant to the context, you don’t actually have a request: you have a vague feeling that something is wrong, and you want the other person to help you figure out what that something might be.  And that’s fine!  But you should back up say it that way.  “I’ve been reading over X, and I have a feeling that something is wrong with it.  Can you help me understand it better?”  Note that the request is concrete, attainable, relevant, and open ended.  While it does have a “yes” or “no,” it’s worded in a way that leads to further conversation.  If the request were worded, “Do you feel the same way?” someone already pressured by time and responsibility might not give you the conversation you’re looking for.

Programmers are a notorious bunch for having ADHD and other neurodivergencies. The consent communication pattern works with the programmer’s brain. A request without a topic just… is. It exist without any surrounding context. People, especially those for whom context is difficult, are much more willing to jump in and respond to requests when their brains aren’t being racked with the effort of trying to make sense of the context.

Hanna Thomas recently wrote:

Agile is described as a mindset. But let’s call it what it is and skip the middleman. Successful organisations aren’t ones that adopt an ‘Agile mindset’. Successful organisations are ones that adopt a feminist, queer, anti-establishment, progressive mindset — one that is flexible, experimental, pushes boundaries, self-organises, and acts in service of community.

I’d add “kinky” to that list. Kink has a very strong ethos of respecting other people, both their wants and needs and their boundaries and limitations. Kink has a strong, evidence-based and experience-driven system for creating safety, enabling personal growth, and asking for potentially personally embarrassing or emotionally vulnerable moments in public spaces where deeply intimate and possibly dangerous things are happening. If we can do it where intimate or physically risky things are going on, then software developers can learn to do it in a professional space that (I hope) is not quite so intimate, vulnerable, or physically risky.

30Aug

Status: Fixing The Buddha

Posted by Elf Sternberg as chat

I was cleaning up my Github repositories and trying to remove some things that aren’t relevant or are somewhat embarrassing. One of them is the "programming_rust" repository, which I made while I was going through the book but no longer need. (Technically, I didn’t need one but for the fact that I have a laptop and a desktop and like to use both and different times.) I do want to keep the Buddhabrot generator, but I also want to upgrade it a bit to Rust 1.37.

So imagine my horror when I discovered that it throws all kinds of overflow errors. I think there’s some weakness in my thinking here, something I did incorrectly and that’s why I’m getting these exception and backtraces.

I started to decompose the program with better organization and separation of concerns. That’s all of my status right now, as my day was derailed by good news: I got a job! I can’t reveal too many details yet, but I’ll let you all know on September 9th what the work entails. I spent most of yesterday running around getting ready for the job, upgrading the office-based toolset that I use to commute and stay productive. I’m now drooling over a lovely electric bicycle that, unfortunately, will take two weeks to ship here. I spent my afternoon shopping for a power supply for my laptop and telling all the helpful recruiters thanks but I’m working.

Any evening work was derailed by a family member getting food poisoning bad enough that a run to the emergency room was merited. While they were sedated I managed to run through a couple iterations of the Buddhabrot command line handler, but wasn’t happy with any of them. Try again tomorrow.

I’ve been reading Gerald Weinberg’s "The Psychology of Computer Programming," written in 1971 (!), which is considered something of a classic. It’s written as a textbook and meant to be used in a higher-level programming course for both programmers and their managers.

Chapter one has some interesting passages. First, there are the potential HR violations. At a previous employer, the in-house code of conduct included this gem:

Examples of unacceptable behavior include:

  • The use of sexualized language or imagery and unwelcome sexual attention or advances,

Weinberg is big on learning by example. Even in 1971, he wants you to read other people’s code to get used to it. But his examples are laden with randy imagery: "Late at night, when the grizzled old-timer is curled up in bed with a sexy subroutine or a mystifying macro, the young blade is busily engaged in a dialogue with his terminal." In a passage about learning how to read other people’s code, he writes,

[Programs] are unlike novels, and the best way to read them is not always from beginning to end. They are not even like mysteries, where we can turn to the to the penultimate page for the best part — or like sexy books, which we can let fall open to the most creased pages in order to find the warmest passages.

Now, in 1971, Weinberg was definitely having a bit of a wink with his audience. The "professionalization" (read: the ejection of women) from the white-collar job of programming was more or less complete, and if there was the odd woman in the room (in many senses of the word "odd," as far as Weinberg’s cohort was concerned), well, she’d just have to suck it up, honey. Guys are like that.

And yet, I’d like to push back on the idea that no degree of carnal language is acceptable in the software development space. I mean, in neither case does Weinberg discuss the gender of the participants; there’s no unwelcome attention here. Besides, we talk about "slaying bugs" and "crushing the spec" and "burning down the backlog," and I find that kind of violent language crude and offensive, but it’s perfectly acceptable.

Secondly, Weinberg is hip to "slow learning." He bemoans the age of "the terminal," and that people don’t learn to read programs anymore, they just write them. He cites "a young novelist who claimed he never read as his work was too original to find inspiration from other books. His career misproves his claim." He talks a bit about how COBOL was meant to tell executives they could not only read code, but that they might someday write it, eliminating expensive and quirky programmers altogether.

I can’t help but wonder what Weinberg thinks of the modern IDE with all its bells and whistles. I find myself liking LSP-enabled editors, and writing in Rust or Python is much better now that we have these tools. Instant feedback that the code is syntactically incorrect or has type errors or, in the best case, that there might be a cleaner way to write something, is awesome.

Weinberg would probably have mixed feelings about GitHub. He absolutely wants you to read other people’s code, but sometimes you do so not out of an intersest in learning, but as an exercise in mockery. The Daily WTF lives strictly to do the latter. But there is a lot of high-quality code on there, which brings me to the questions at the end of the book:

  • When was the last time you read someone else’s code? What took you so long?

The last time I read someone else’s code was two days ago. I was reading the inside of the Rustlang image library in order to better understand how those developers thought about image processing, so that the code I as writing could fit into their development process witohut too much difficulty. I didn’t read any yesterday; I had runaround appointments and other priorities, which in the evening were hijacked by my partner getting food poisoning.

  • When was the last time you read code you, yourself, wrote that was more than six months old?

Yesterday. While "upgrading" an old project, I discovered that the latest revision of the compiler throws a ton of maths warnings and, you know, maybe I should investigate those. What I learned was that, six months later, better decompositions than the ones I first used are now available to my mind, and I should think about a refactor that will save me grief and pain.

In all, my summary of chapter one is this: Time has moved on. Programming languages have gotten more ergonomic. When Weinberg was writing, GOTO was still a perfectly acceptable way of managing a subroutine. LSP and IDE assist in maintaining an ambient awareness of the correctness of code. Repositories like GitHub provide an enormous library of code, some of which is highly readable, and it’s always good to point out when you find some code that makes great sense. The discipline of test-driven development means that, if followed well, you end up with highly readable code because Step Three is always "clean up the mess you just made" and make it fit the "Context, Container, Component, Code" model.

I’ll probably enjoy the other chapters just as much. There’s just something fascinating about delving into a book 48 years out of date about the work I do every day and seeing what’s still relevant, and what has changed.

Pretty much the only way I get things to *stay* in my head is to do them, and then document what I did.

Task one: Doing the Multithreaded Server Exercise from the book.

Yesterday, I ripped through the Rust “Writing a multithreaded web server” exercise, as I don’t have much experience with writing network code with Rust. This is the last exercise in The Rust Book (the one from O’Reilly, and also the one that ships free with the language as a collecition of HTML pages), and it’s a pretty good exercise, but as I was working on it I discovered there are two bugs with the documentation.

The objective of the exercise is to creata simple session handler named handle_connection(), that takes a request and sends a response. The engine is so primitive it can only serve three pages: a hello world page, a hello world page with an active delay (to simulate slow networking), and a 404 page.

Bug 1: Streams must be shut down

The last line of handle_connection flushes the socket and then the connection ends. Theoretically, this means that the socket should shutdown, but it doesn’t seem to work that way. There were two ways of fixing it.

The first was to call shutdown manually:

use std::net::Shutdown;

fn handle_connection(mut stream: TcpStream) {

...

    stream
        .shutdown(Shutdown::Both)
        .expect("shutdown call failed");

Doing this worked best, as clients with “keepalive” active got the right message and shut the connection gracefully. Without this instruction, the socket would stay open and the server would wait for a timeout.

The second was to move the stream. In main, change the execute line to:

    pool.execute(move || handle_connection(stream));

This would force the stream to die when the handle function terminated, but it would result in a “Error: Connection reset by peer” message from older W3C HTTP libraries (like thosed use in the console applications links). Only by closing the socket gracefully could both problems be eliminated.

Bug 2: Ergonomics marches on!

This is less of a bug than it is an aesthetic point. I’m currently running the latest and greatest of Rust stable: 1.37. The book describes a trait called FnBox that works around an ownership issue:

trait FnBox {
    fn call_box(self: Box<Self>);
}

impl<F: FnOnce()> FnBox for F {
    fn call_box(self: Box<F>) {
        (*self)()
    }
}
...
    job.call_box();

This method is intended to replace this syntax, which prior to Rust 1.37 threw a compile-time error: (*job)(). Here, job is our closure produced by the pool.execute function above. The problem is that Rust doesn’t know how big job will be and so can’t allocate stack space for it. By using this trick, we tell Rust that we’re explicitly and finally taking ownership of the FnOnce() object in order to run it once.

This is no longer necessary. As of Rust 1.37, you can throw away the entire FnBox shenanigans and replace it with:

    job();

Task two: Draw the rest of the m11g* owl.


PingCAP has a wonderful tutorial designed to teach new programmers how to program in Rust. Starting from cargo new, you end with an implementation of a high-performance, caching key-value storage server and a client to go with it.

Their tutorial is very sparse. You go from having almost nothing to having a working k/v store very quickly. I implemented the cheapest in-memory one-shot variant I could to get through the first lesson and learned a ton of things about the command line management library, clap], how to use the #![deny(missing_docs)] syntax to guide my documentation efforts, and how to use the failure crate.

I’m afraid, however, that it’s a bit more than I can chew, and I may be putting the other lessons aside, in order to start wrapping up some other projects that I’d like to concentrate on.

Task three: The Internet is for Porn

I make no secret of the fact that I’m as fond of the smut as any other human being. I joke that “porn, video games, and anime” are the foundation of my career. With Tumblr gone, I’ve been trying to cobble together a couple of other streams to cater to my interests, and I just find it interesting that my toolset is pretty much as diverse as my tastes.

I won’t post the results here as they’re very idiosyncratic. I wrote my cataloging tool for Pixiv in Perl, the one for Twitter in Hy, the archiving tool for Tumblr in straight Python, and some rather smaller and more obscure ones in Bash. I haven’t written Perl in a while, and it was an interesting struggle but not a difficult one. Perl, being the first second language I was paid to work in professionally (truthfully: the first was COBOL w/DB2, but don’t tell anyone!) and the language that put food on the family for the first seven years of that professional life, is probably embedded in my DNA like a tick, never to be fully erased.


* The words “internationalization” and “localization,” and now “Kubernetes” are all commonly shortened to “i18n”, “l10n”, and “k8s”, where the number in the middle indicates the number of letters elided. You’re welcome to figure out what 11 letters go between ‘m11g’ on your own.

In my quest to become the world’s greatest programmer, I have been informed that I must reboot my React and Django skills. What better way to make that happen than to port my Boggle solver program to the web?

It’s not good enough to shell out to boggle-solve and just make things happen. I wanted something that required a little more learning. And therefore, we come to the next step in the evolution of the boggle solving library: A C foreign function interface.

The good part of this story is that we’re still going to be writing almost entirely in Rust. As long as we stay away from asynchronous Rust, the Rust runtime can sit comfortably next to the C runtime without a headache. (And yes, C has a runtime, but there are historical reasons for why that runtime is provided by the OS rather than by the compiler.)

There is one catch. The existing boggle solver takes a dictionary and at least one board. But we can solve many boards with the same dictionary. So we must answer this question: Do we want the dictionary to be reusable? Loading the dictionary takes a large amount of the overall processing time and being able to load the dictionary once and re-use it would be worthwhile. The dictionary, however, is a complex Rust structure with a type paramater.

We could answer the question with ‘No.’ In which case our function takes a path to a dictionary, and a memory blob that represents the board, and returns another memory blob representing the answers.

That is sad. We don’t do sad things. Yes, the dictionary needs to be reusable.

The first thing we’re going to do is lock down the dictionary’s type. Consistently, we’ve been using Node<char>. Node is a node in a trie structure. It has no specialized root node, so there’s no point in having a Trie<char> object. For the FFI, we want Rust to make sure the layout of our container is compatible with C. We’ll use the compiler directive repr(C) to do that, and then contain our root node in a new struct:

#[repr(C)]
pub struct Trie(boggle_solver::trie::Node<char>);

For the C FFI, the Rust compiler directive no_mangle tells Rust not to change the names of the functions in a Rusty way, but to leave them as-is so the C runtime, which is primitive and doesn’t know anything about name mangling, will be able to find them.

We’e about to wander into the world of Rust black magic here. The Rustinomican, the book about Rust’s darker world, has dire warnings about what I’m about to show you. This is C-land, where C-things happen and C-corruption is the order of the day.

In the original boggle-solver there is a function, dict(path: &str) -> Node<char> that takes a path to a dictionary file and returns a fully populated trie. We’re going to use that function, wrap the results in an opaque pointer (a pointer the C programmer shouldn’t examine internally), and return that pointer. That pointer has to be taken away from Rust’s type handler. Rust needs to not worry about it, not borrow check it. It is a sinful object lost to the world of C. The Rust function transmute does this:

#[no_mangle]
unsafe extern "C" fn dictionary_make(filepath: *const c_char) -> *const Trie {
    transmute(Box::new(Trie(dict(
        CStr::from_ptr(filepath).to_str().unwrap(),
    ))))
}

This function is marked unsafe, which is fine because only C-side programs will ever call it. Inside, we take the filepath, which is an array of bytes terminated by a null, and cast it to a Rust string, then pass that to our dictionary builder, wrap the newly made dictionary in the new Trie struct, Box it up, and then transmute that box into our C-like pointer. Phew!

Since we are back in the land of C we are responsible for freeing the dictionary when we’re done with it. Miraculously, we can get away with this by using transmute to hand it back to Rust, put it into a scope, and then… just let Rust drop it and recover the memory.

#[no_mangle]
unsafe extern "C" fn dictionary_destroy(trie: *const Trie) {
    let _drop_me: Box<Trie> = transmute(trie);
}

Now that is Rust black magic!

We can now solve a board for a given dictionary. I won’t copy the entire text of the boggle-solve binary here, which makes up the bulk of this function. At the end of that function, we had a vector of all the words found on the board, which was the return type of the function. In our current case, we need to return something C understands.

Traditionally, the way C does this is to allocate a buffer big enough to hold the results, pass that buffer to the function (as a pointer), and then expect the buffer to be full when the results come back.

Here, we take the solutions: Vec<String> and from it create a massive string of all the results separated by newlines (to make it printable). We cast that string into C, and copy the results into the buffer. This isn’t the world’s most efficient example; at one point, we have three copies of the solution in memory, and the "known highest-scoring board" returns a solution set that’s 4,604 bytes long (including newlines). Using a conservative allocation scheme means that at some point we’re using 24 kilobytes of memory. It’s only for a few seconds until it drops all but the 8KB used for the parent buffer. Even in these days of multi-gigabyte machines, that still seems like a lot:

#[no_mangle]
unsafe extern "C" fn solve_for_dictionary(
    board_text: *const c_char,
    dictionary: *const Trie,
    found_words: *mut c_char,
) {

...

    let mut result = String::new();

    if !solutions.is_empty() {
        for solution in solutions {
            result.push_str(&solution);
            result.push('\n');
        }
    }

    let s = CString::new(result).unwrap();
    libc::strcpy(found_words, s.as_ptr());
}

And finally, we can now solve given the path to the dictionary. All of these functions are C functions with C signatures. If we didn’t call dictionary_destroy() there, the trie would never be recovered, and that would be a memory leak. We’re in C land here, where memory management is once again completely and utterly our responsibility.

#[no_mangle]
unsafe extern "C" fn solve(
    board_text: *const c_char,
    dictionary_path: *const c_char,
    found_words: *mut c_char,
) {
    let trie = dictionary_make(dictionary_path);
    solve_for_dictionary(board_text, trie, found_words);
    dictionary_destroy(trie);
}

To make this work with C, we need to provide a C-style header file that tells the C compiler the names and signatures of the objects we’ve created. After all the no_mangle and our repr opaque structure, here’s everything C needs to know about:

#include <stdlib.h>

struct Trie;

struct Trie* dictionary_make(char* filepath);
void dictionary_destroy(struct Trie* trie);
void solve_for_dictionary(char* board_text, struct Trie* dictionary, char* buffer);
void solve(char* board_text, char* dictionary_filepath, char* buffer);

A quick (and very no-frills; I’ve removed ALL the error-handling regarding arguments and board reading) example of using this in C:

#include "boggle_solver.h"
#include <stdio.h>
#include <string.h>

#define MAXBOARDSIZE 64
#define MAXRETURNSIZE 8192

int main (int argc, char** argv) {
  char board[MAXBOARDSIZE];
  char buffer[MAXRETURNSIZE];
  FILE *fp;

  fp = fopen(argv[1], "r");
  size_t len = fread(board, sizeof(char), MAXBOARDSIZE, fp);
  board[len++] = '\0'; /* Just to be safe. */
  fclose(fp);

  solve(board, "/usr/share/dict/words", buffer);
  printf("%s\n", buffer);
  return 0;
}

To use this code, save this in a file name csolve.c in the root directory of the FFI library. We can then build the library and the binary with cargo and GCC:

cargo build --release
gcc ./csolve ./csolve.c -Isrc  -L. -l:target/release/libboggle_solver.so

We can then run this code against the example board:

$ ./example/csolve ../sample_board.txt
aerie
air
ape
...
wore
wren
wrier

So now we have boggle-solver, the C-land version. The binary isn’t even very large (about 8K on my Linux box), but everything is dynamically linked so, in the end, it’s a big executable, hauling the entire Rust runtime with it.

Still, this is a pretty good demonstration of what’s possible. It’s not a great demonstration; in that case, I’d replace the call to dict with something which could take arrays of strings, or maybe even a file pointer, and do magic with it inside of Rust. But this accomplishes the initial goal: get something that can be run from a C program and run the Boggle Solver.

Next: Go one step further, and make it work in Python.

I have a job interview this week and one thing they wanted me to know was that they were into Uncle Bob, “Clean code,” and test driven development. They were really, really emphatic about Uncle Bob’s “Transformation Priority Premise,” his notion that you start with failing (broken) code that does nothing, then does one thing, then does a couple of things, each time writing a test that’s more and more specific, closer to the full specification of the desired outcome, and in each step making exactly one change to the code, in a prioritized list, to make the test pass.

Bob has a comic in which he illustrates the process. He claims that if you follow the TPP transformations as written you can automatically derive quicksort from a specification about sorting, but if you deviate and choose a later but “easier” transformation you’ll end up with bubble sort, the slowest of all the classic sorting algorithms.

Since I’ve been given a week to cram for this exam, I’ve been doing exactly that and I have had a number of reactions.

The first is that object-oriented programming as it is traditionally pursued in enterprise deployments is one of the greatest embarassements in human history. Looking at the examples, even the idealized ones, there’s so much clutter on the page for error handling that, even with Martin’s dictat that code should follow the “functional cores and imperative wrappers” pattern, and all objects should be as SOLID as possible, object oriened programming is noisy, cluttered, and cognitively overloading.

The second is that so many of the TPP “transformation” just don’t make sense in a functional programming context. For example, Bob says that you should go:

7. variable → array
8. array → collection

I’m just gonna stare at you in Functor. In functional programming, we learn about scalars and arrays, but very soon we also learn that these are specific instances of a simple idea, the value, and to say that one is more complex than another is a category error (in the ontological sense). It might make sense in practice to treat these as cognitively more complex objects, but in the end the object of functional programming is to abstract that all away and concentrate on the transformations of data qua data.

Transformation number 9 is just as curious:

9. statement → recursion 

The higher the number, the less you’re supposed to indulge in it, so let’s talk about this: why “recursion?” I’m a huge fan of recursion and prefer it to for loops and other imperative constructs, but I’m an even bigger fan of map / reduce / filter and the general power of folds to replace primitive recursion in almost every case. There’s an entire programming language built around this idea (Micrsoft’s Bosque).

I’ve spent this weekend in a long plunge back into object-oriented programming The Enterprise Way™, and I’ve came away feeling like not much has changed since the 1990s. The code is still too verbose, with too many error conditions being handled independently of each other instead of categorically, and with too much code being written about different code paths rather than different expressions. Too many developers of the sort Uncle Bob is trying to reach still think of their programs in Tinkertoy terms: fit the pieces together like a machine, stand the thing up, and hope it works.

Boggling 2: Parallelized Boogaloo!

When last we left our heroes, they’d solved Boggle! and done so with amazing speed. The only thing left to do was go faster.

There are two strategies, actually, for going faster. The first, mind-bogglingly enough, is to go the other way with your search space: building a graph of the board, for each word in the dictionary scan to see if that word could be made from the board.

There is a size of board for which this strategy is actually faster than the “search the board and check the word” strategy we’re using, but I’ll tell you this: that board is bigger than 12×12. Even with the two-phase bitmap used for boards bigger than 8×8, a 12×12 board solves in only a second or two.

I’m going to give much of the credit for this astounding speed to The Rust Programming Language itself. The allocator for Rust is wicked clever about allocating memory resources while at the same time preventing a whole host of errors common to C++. Having programmed mostly in Rust for the past nine months (with some Haskell and Racket, and my day-to-day tool of Hy), I think I’d rather be work at a coffee shop than go back to hacking C++ or Java.

Let’s stick with our solution, but let’s multithread it. This article will show you how I multithreaded the Boggle searcher using the low-level Crossbeam toolkit and it’s work queue extension, Deque. Cossbeam is an amazing library that takes Rust’s cleverness and applies it to allocating CPU resources.

First thing: except for the solve() function, nothing else in our program has to change. Everything else is already thread-safe. The board and the dictionary are both read-only: every call to solve launches the recursive solver solution for a specific starting point on the board, with a new, unique Scanner object to store both the successful matches (and prefix matches) and the bitmap to track the path taken.

So the trick is to call solveforpos() in a thread. However, the average consumer laptop has four or eight cores, but the average Boggle™ board has more positions than four or eight. We can’t just launch as many threads as we have positions to check.

It’s also not efficient to say, “If we have four cores, let’s just divvy up the 4×4 board into 4 work groups and give each group to a different thread.” This strategy is known as job batching, but it can lead to inefficiencies. One batch may be all small jobs, whereas another may be all big jobs. The small batches will be done and the cores idling while waiting for the big jobs to finish. We’ll address that later.

Crossbeam provides a mechanism for dealing with this: stealing. A collection of queues is shared among the threads, and when one thread finishes its local queue it can steal a job from another thread’s queue. If that doesn’t work, it can steal from a global queue.

For now, let’s not worry about stealing from peers. Let’s just have a global queue of Jobs to tackle. A Job encodes the starting position on the Board for a given scan and the Scanned object from the previous post that will collect the data. The system is done when the queue of jobs is empty.

How to use Crossbeam-Deque

We’re going to be using crossbeam and crossbeam-deque, Injector is the first object we’re going to be using from crossbeam. Injector is our deque (a first-in, first-out queue), and we’re loading our job descriptions into it. It’s a funny name: it implies that it pushes jobs, but most of the time our worker threads will actually be pulling jobs out of the Injector.

pub fn solve_mt(board: &Board, threads: usize) -> Vec<String> {
    struct Job(isize, isize, Scanned);

    let work = &{
        let mut work: Injector<Job> = Injector::new();
        for x in 0..board.mx {
            for y in 0..board.my {
                work.push(Job(x, y,
                    Scanned::new("".to_string(), Ledger::new(board.mx, board.my)),
                ));
            }
        }
        work
    };

The outer work variable is a reference to the work queue, which itself is now anonymous. This is important because it lets Rust know that work is going to be reaped at the end of this function while still allowing multiple immutable references to work to be used.

Crossbeam has a nifty thread launching system with one powerful abstraction I mentioned earlier: temporal scope. The scope() function allows for structured concurrency, guaranteeing that every thread launched within a scope has terminated before the parent thread is allowed to proceed. Inside a scope we’re free to spawn() as many threads as we like, but they will all end before we leave the scope.

So: we need a place to record the results of any one search, and then we need to launch the search, and at the end we need to collect all the results and return them. We also need to find the job we’re going to be processing, but I’m going to cheat and do that later. Let’s launch:

    let mut solutions: Vec<String> = vec![];
    crossbeam::scope(|spawner| {
        let handles: Vec<ScopedJoinHandle<Vec<String>>> = (0..threads)
            .map(|_| {
                spawner.spawn(move |_| {
                    let mut solutions: Vec<String> = vec![];
                    let mut queue: Worker<Job> = Worker::new_fifo();
                    loop {
                        match find_task(&mut queue, &work) {
                            Some(mut job) => {
                                solveforpos(&board, (job.0, job.1), &mut job.2, &mut solutions);
                            }
                            None => break,
                        }
                    }
                    solutions
                })
            })
            .collect();

A couple of things stand out here.

First, the topmost solutions collection is outside the scope. We don’t want to lose the result of our work when the scope ends and Rust reclaims the memory needed to do our processing. The temporary data generated during a search is ephemeral, but our final answers shouldn’t be.

Second, the handles collection is where our data actually ends up, and we need to extract the results of a thread from it. It’s a fairly complex type signature, but straightforward once you figure it out.

Third, we’ve got that Worker thing, which is also from crossbeam. Crossbeam-deque has its ‘job stealing’ mechanism, in which all threads in a scope can have multiple job queues and ‘steal’ work from each other (usually half the items in a queue) as each thread finishes its own queue.

We’re not going to use this feature. We’re just going to have a single scoped queue in Injector and pop jobs off. But crossbeam wants each thread to have a queue, so we’re going to have it, but at run-time that queue will have only one job in it, the current job. When a thread has completed one full search it will loop around and ask the Injector, for another job.

Fourth, we have another solutions vector of strings; this is the one that goes into the vector of ScopedJoinHandles, and in the end we have to merge and flatten them all.

And finally, a pro-tip: see that .collect() at the bottom? I used Range.map() to launch my threads, rather than a for loop. And .map() is lazy: it produces exactly one iteration, waits for it to complete, and then goes on to produce the next. Without .collect(), the spawner produces one child thread which does all the work… and then we’re actually a little slower than the the single-threaded version due to the overhead. Using .collect() there forces .map() to be eager, to try and collect all the threads at once and put their results into handles immediately.

All right, we’ve run the threads and collected the results. What’s next? Extraction:

        solutions = handles
            .into_iter()
            .map(|handle| handle.join().unwrap())
            .flatten()
            .collect()
    })
    .unwrap();

    solutions.sort();
    solutions.dedup();
    solutions
}

In the first line, when the parent scope hits the first handle.join(), it waits until the first thread launched terminates, and then collects its data. If the second and third thread are also done, it immediately collects those, otherwise it goes back into the waiting state until it’s merged the entire scope down. The .unwrap() is there to handle error conditions of the total concurrency handling process, but I’m going to ignore it for now.

We then sort and de-duplicate our collection and return it.

Much of the point of this exercise is to simplify the queue example from crossbeam. The version of find_task on that page is esoteric and difficult to read. I wanted a simpler version that I could understand. This is what mine looks like:

fn find_task<T>(local: &mut Worker<T>, global: &Injector<T>) -> Option<T> {
    match local.pop() {
        Some(job) => Some(job),
        None => loop {
            match global.steal() {
                Steal::Success(job) => break Some(job),
                Steal::Empty => break None,
                Steal::Retry => {}
            }
        },
    }
}

A thread looking for work passes in both its local and the global queues. We attempt to pop the local job off, which should never exist; it’s just an empty container needed to pass this assertion. We then attempt to steal another job from the global queue. If we’re out of jobs, we return None. If we find a job, we return Some(job). And finally, if the global queue was locked (due to another thread being in the middle of getting work), we tell find_task to busy-wait and try again.

That’s it. That’s the whole of what that big mess on the crossbeam-deque page is trying to tell you, only using iter::repeat instead of loop and throwing in a whole bunch of stuff about stealing jobs from other threads. Which you will need someday if you write lots of concurrent code. For learning, however, you will not want it; you want the simplest level of code you can hold in your head. A loop does that for me.

And that’s it. Your Boggle solver is now multi-threaded. And the results are impressive: With four threads, it’s now about three times faster, solving the board in about 350ns. It’s not four times faster; there’s allocation and processing overhead involved in separating out the jobs. But it’s three times faster, which is a great improvement.

As always, the entire codebase is avaiable from the the boggle-solver project on my Github.

All right, so now I’ve got a bitmap for tracking the path I’ve taken on the board, and I’ve got a trie that can determine whether or not a sequence of letters is a word (or the prefix of a word) from a specified dictionary.

Let’s solve Boggle™.

A Boggle board is a 4×4 grid where each cell is filled with one of sixteen dice. Each die has a letter.  Our objective is to scan the board finding words without ever using the same die twice for a singe word.  The typical strategy is to start at a letter and scan its neighbors for patterns that might register in our brains as “a word.” We’ll use the same technique in our algorithm:

  1. Start with a letter
  2. Scan one neighbor, creating a two-letter word
  3. If that word is a whole word, add it to the solutions list
  4. If that word is a prefix of a word in the dictionary
    1. Pick a neighbor that we have not visited this scan
    2. If there is one, make a three-letter word, and recurse to 3, and so on.
  5. Terminate when the “word” is not a prefix of any word or there are no valid neighbors to visit

So we need a structure to contain the word we’ve scanned, and the Ledger. Let’s call it Scanned. The only special thing we need for Scanned is that for every step, it needs to clone the Ledger; that is, we get a new copy of Scanned for each neighboring letter, and recurse down that path with that assembled word. When the search terminates (due to exhaustion of the search space or the failure of the prefix), we will then use the copy of Scanned at this point in the stack to create a new search with the next neighbor, and so forth. This is a standard recursive strategy.

The result looks like this:

pub fn solve(&mut board) -> Vec<String> {
  let solutions = Vec<string>
  for x in 0..mx {
    for y in 0..my {
      let mut possibles = Scanned::new("".to_string(), Vec::new());
      solveforpos(board, x, y, &mut possibles, &mut solutions);
    }
  }
  solutions.sort();
  solutions.dedup();
  solutions.to_vec()
}

Step 2. is solveforpos(), where we implement the recursive strategy. But for one technicality, we could have implemented this as a single function, but for one little detail: The letter ‘Q’. There are special rules about Q, and we have to support them, as the kids say, “because English.”

pub(in crate) fn solveforpos(
  board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>)
) {
  let c = board.board[x as usize][y as usize];
  innersolveforpos(c, board, (x, y), curr, solutions, false);
  if c == 'q' {
    innersolveforpos('u', board, (x, y), curr, solutions, true);
  }
}

The innersolveforpos() function checks for word validity, prefix validity, and position validity. Not shown here (but you can find it in the source), the Scanned object actually has the responsibility for adding the letter if the position is valid, and returning a new, longer “maybe-word” (remember, we have to support backtracking in our recursion) and a new, updated path Ledger. So that’s where we pass the “skip position check” flag, which in turn lets us put both “Q” (so we catch words like ‘sheqel’ and ‘burqa’) and “Qu” letters into our candidate string.

Look above, and you’ll see that we add ‘qu’ blindly whenever we encounter ‘q’. This is important. We have to let that happen because we need to continue even if “qe” and “qa” aren’t in the candidate list. “Quota” is a real word.

Once we’ve added the letter(s) and determined that the string we have is the prefix of a word found in the dictionary, we then scan the neighbors and recurse, skipping the current cube. The Ledger makes sure that we don’t re-check a letter for a given single search, but by cloning the letter and the candidate we also ensure that the backtracking is done correctly.

fn innersolveforpos(c: char, board: &Board, (x, y): (isize, isize),
  curr: &mut Scanned, solutions: &mut Vec<String>, skip_pos_check: bool
) {
  match curr.add(c, (x, y), skip_pos_check) {
    None => return,
    Some(mut newcurr) => {
      if newcurr.0.len() > 2 && board.words.find(&mut newcurr.0.chars()) {
        solutions.push(newcurr.0.to_string());
      }
      if !board.words.pref(&mut newcurr.0.chars()) {
        return;
      }

      for i in -1..=1 {
        for j in -1..=1 {
          if !(i == 0 && j == 0) {
            // Skip the current block!
            let (nx, ny): (isize, isize) = (x as isize + i, y as isize + j);
            if nx >= 0 && nx < board.mx && ny >= 0 && ny < board.my {
              solveforpos(board, (nx, ny), &mut newcurr, solutions)
            }
          }
        }
      }
    }
  }
}

And that’s pretty much how you solve Boggle. According to one source, the total number of Boggle boards out there in (nxm)! (that’s the factorial symbol there), or for 4×4 board, 16!, or it would take 20,922,789,888,000 visits to do absolutely every search of the board. Except for one thing: the English language is not random! It’s messy, but not random. The fact that many letter combinations cannot actually lead to a real word found in the candidate dictionary means that the vast majority of searches terminate early.

On my laptop, a 4×4 board with all ‘e’s and a dictionary of ‘eee’ through ‘eeeeeeeeeeeeeeee’ takes 5 minutes and 45 seconds to complete. But in practice, the average runtime of a boggle board with this algorithm is barely 1.5 milliseconds.

Which is not too damn bad at all.

Can we go faster? Yes we can.

Subscribe to Feed

Categories

Calendar

September 2019
M T W T F S S
« Aug    
 1
2345678
9101112131415
16171819202122
23242526272829
30