Lab notes for the first two weeks of January.

I’ve been fiddling with a branch of the PamSeam project named Lattice in an effort to streamline the base and speed up the runtime. There are three algorithms enabled and all of them have similar processes. The basic engine is also very slow, with lots of large allocations and copying the image over and over.

The basic idea of the Lattice project is to turn the image into a graph, one node for each pixel, with eight pointers to the neighboring nodes.

The Lattice graph is a generic container; we can put anything at all into it, so the data structure the Lattice holds can contain the image, any metadata maps (such an energy or flow map), the accumulative energy or flow data from processing, and the upward pointers that calculate the final seam.

The Lattice can remove a seam without copying; that is, the pointers from left-to-right or top-to-bottom can be set to “jump” the removed seam without having to copy the data. This can be done all at once, removing the current need to “seam and copy” both the energy map and the original image. The resulting image can be extracted from the Lattice data at the end.

I believe this process will make a single seam slower, but will significantly speed up the process of removing multiple seams. Also, known problem with the energy map is that removing a seam “damages” not just the pixels immediately adjacent to the seam, but also the pixels of any residuals seam the removed seam traversed. With the lattice, before a seam is removed the energy map regions to the left & right of (or above & below) the scheduled seam can be recalculated prior to removing the seam, enabling the algorithm to detect that the recalculation is no longer creating changes and halting any further processing. It might even be possible to limit recalculating seams themselves only to the damaged region.

There are problems with this solution. The first is that it takes significantly more memory to store a single image for processing. While we may be able to eliminate the multiple-planes problem for pixels, energy, and total energy, we’ve replaced those with a [u32; 8] array of neighbor offsets, and that’s a lot of memory.

Next, while the graph itself is represented by a vector and a node’s neighbors are stored as offsets into the vector (a common representation technique in Rust), the graph itself is not coherently “chunkable” the way a vector would be. This makes multi-threaded processing problematic, as Rust absolutely does NOT want you to have multiple threads writing to the graph simultaneously. Having atomic sub-graphs doesn’t seem to be an answer, as even linear processing relies on having access to neighboring nodes that, regardless of which algorithm is used, at least one other thread must also have read-access. This could theoretically be solved by having one graph for reading and another for writing, but the memory usage quickly becomes burdensome.

Third, the “graph walking” algorithms become fairly complex. The current solution is to create a Step and Walker; the Step knows where you are and the direction you are headed, and as the Walker takes Steps, it returns the desired contents of the node at that point. This solution allows the Step to modify its direction, which is needed for the meandering path a seam may take, and it allows the Step to terminate early, which is needed for the damaged-seam management pass. The biggest headache here is lifetime management, as the Step and Walker need similar lifetimes, but they’ll also be taking the Lattice as an argument, and I have yet to figure out the lifetime relationships between the two.

I’m confident I can get this to work, but as I now have a full-time job, it’s going much slower than I’d like. It’s not exactly something they need at the office, so I’m restricted to working on it only in my spare time, of which I have very little these days.

31Dec

Math is no Harder than Drawing

Posted by Elf Sternberg as Uncategorized

I recently read an article on the economics of ancient Rome that suggested that, while the written arts, especially those that involved education or erudition, were highly valued, the visual and

My great legacy to the world, a small bit of observability in web server configuration.

performance arts were not. The visual arts, especially, were regarded as the work of the lowly and demeaned, as almost all the arts we see from Rome, Pompeii, and Herculaneum, all of the frescoes and mosaics that have survived to this day, were made by slaves.

Skilled slaves, but slaves nonetheless. The visual arts could be bought and sold just as readily as a loaf of bread, and unlike bread, a good mosaic would last for decades.

This mentality is still at play. And I don’t think it’s going to go away anytime soon. The reason artists struggle is because a lot of people look at visual art and say, “I could do that. With enough time and study, I could do that.” Textbooks remind us time and again that the illustrative arts “can be learned by anyone,” that we could all learn to draw what we see, or even what we imagine, with just a few dozen hours to get all the basics down. You don’t find people paying for art much because there’s just so much of it, and only so many walls to hang it on.

Here’s the thing about computer programming: it’s exactly the same. Somehow, because it involves math (although really, most programmers only use arithmetic; I only recently started to use actual maths, and I’ve been doing this for 30 years), and a lot of people went through school and made the decision that they “didn’t have a head for math,” and so decided that “no matter how much time and study I give it, I couldn’t learn that.”

It’s not true. But as long as the majority of the population believes it to be true, and continues to be opposed to learning it, they’re going to keep paying computer programmers a lot more than they are artists.

Which is a shame. I rarely feel like I’ve contributed much great to the world, but I love art and artists and have a lot of paid-for art that, sadly, I haven’t yet had the time to mount and display. Artists consistently make me happy in ways coders only sometimes do.

I realized the other day that my role in my current job requires that I do something very, very strange, as far as I’m concerned. I realized there are some things I have to avoid learning, and I have to avoid them quite strenuously. I have to know they exist, but I have to not know any more than that.

One of my tasks is to help software engineers write their own tests and documentation. To be good at that, I have to help them focus on the kind of documentation they’re writing, and at the moment that documentation is “pager duty” how-tos: short instructions for how human beings must respond to problems and issues with the running system.

To that end, I have them focusing on “What are the symptoms we’ve seen? What are the correct responses? If the response can be automated, why hasn’t it been?” (Answers such as “It would take too long” or “It’s too rare to justify the development cost” can be debated; “No one knows how” is not acceptable.) And then “Who knows how to fix the issue?”

I have to not know the answers to these questions. Very deliberately, I must avoid knowing the names and faces associated with the answers. Because that way, when I’m proof-reading the documentation, these issue jump out easily. Questions like “Who would I go talk to?” comes easily to mind, and I can pass that back to the engineer.

The best part is how good they’ve been about all this. I really appreciate and like the people I’ve been working with. People diss millennials all the time for their odd, new work-ethic, but I like it: it’s very emotionally aware and deliberate. These people know that emotional labor is work, but the work has a great pay-off. I just have to work hard to keep up and keep participating.

I had an insight the other day, while working with one of the younger colleagues at work, about why I come up with answers and write code so much faster than he does. It’s not that he’s not as smart or as talented as I am, it’s that we look at programming problems in a completely different way.

In a paper from 1974 entitled How Big is a Chunk, Herbert Simon attempted to codify what makes beginning chess players different from grand masters. The answer, Simon believed, is that beginners see pieces and masters see “chunks:” collections of pieces that form a strategic unit on the board, and that can be exploited against other strategic units. Simon built on George A Miller’s notion of “the magical number seven plus or minus two”– the biggest “chunk” that a human brain can handle. Miller argued that we remember a phrase of “seven words,” which could have as many as two dozen syllables, so the “word” was a piece of the chunk, not the syllable, and so on. Simon believed that a “chunk” of chess could have seven “pieces in position” in it, and that a grand master of chess could have anywhere between 25,000 and 100,000 chunks in his head, ready to deploy at a moment’s notice. This is why a grand master can play dozens of games of chess against beginners simultaneously; he doesn’t have to memorize the state of every board, he just has to look at the board and to identify its riskiest chunks and respond accordingly.

My colleague is a huge fan of test driven development (TDD), but what we were developing as a pair programming exercise was a view: there was minimal logic to be written, only some dynamic HTML (well, React) to take data that was already available and present it to the end user in a highly digestible format. Our goal was to remove the repetitive legends automatically generated for many graphs on a page, and replace it with a deliberately generated legend for all the graphs.

“Let’s do this TDD style. Put an empty object on the page,” I suggested. “Okay, make it show the wrong thing. Now, we know the legend was there before with the right labels. We had to supply those labels, let’s go find where they’re generated. How are they cross indexed with the colors? Write that down. Okay, let’s change the wrong thing to show the names. Now add a rectangle with the colors.” When we had that, I said, “Okay, I think you know how to use Flex to lay out the legend, add a title, all that, right?” He agreed that he did. “Oh, and people print this out. Remember that thing we did where we made sure that there were only two graphs per page, and used the CSS ‘no page break’ thing to make sure?” He nodded. “Each page will need a legend, but you’ll have to hide all but the last one in display, and show all of them in print. You know how to do that too, right?”

It all took us about two hours. When it was done, he shook his head and said, “I would never have gotten there that fast. This would have taking me days.”

I assured him he’ll get there. But I realized that the reason I got us there so quickly is that I had a chunk in my head about how HTML works, and another chunk about where the data came from, and a third chunk about how Javascript merges the two. He doesn’t have that experience yet, so he sees rows and <divs> and arrays and classNames and color codes. But that’s the underlying stuff, the pawns and rooks of the game. That’s not the chunks.

The only question left is how to teach these people how to chunk stuff, and what to chunk, and why.

Human beings approach every new experience with a set of expectations, a mental model of what that experience will be about, and the more experienced we are, the more concrete that mental model is going to be. Going to a superhero movie, we expect lots of brawling, and when test driving a new car we expect the steering wheel to be in the same place.

Fiction writers call this mental model the premise, and fiction readers come to every book with a premise in mind. You violate the reader’s expectations at your peril: the reader came expecting one thing and if you don’t deliver, you’d better have a damn good hook that gives them a reason to read on, finish the book, and recommend it to friends.

The same thing is true of open-source software. Both the user’s experience and the code layout should map to something the user is already familiar with. Too much similarity, with just a few "minor" changes, can be frustrating: the Rust versions of the veritable Unix ‘find’ command are brilliant, but they’re just close enough to the original in functionality that it’s hard to justify keeping both sets of command line rules in your head– and everyone who uses ‘find’ is someone who does this for a living, has to work with many different machines, and knows ‘find’ will be the only choice on most of them. They don’t want to learn a different model. They just want better tools.

I’ve recently had this experience with Django. I used to be a Django expert, a hired gun who customized Django administration engines and crafted high-performance query systems. I haven’t worked much with Django 2.0, and now that I have I feel completely lost at sea. Despite my love of the declarative power of Haskell, I have not yet found a mental model for how Django, Django/Rest, and Django/Serializer interact. Part of the problem may be that the project I’m addressing was written by junior engineers who may have simply pasted what they found from Stack Overflow and messed with it until it worked. But my mental models of how Django works do not map well to this project.

So here’s my recommendation: If you’re re-implementing something "in Rust" because "Rust is better than C" (and it is, and I won’t argue people who want to claim otherwise), support the existing functionality 100%. Support everything; if you find a bug, report it to the original maintainers. Add functionality only after you’ve achieved parity. Users often don’t want something different; they want a better version of the same.

(That said, there is a time and a place for doing something different. Nobody knew we wanted the automobile, we just knew we wanted horses that didn’t poop all over the city streets. On the other and, I question the eternal impulse for bigger, faster database engines when most people barely need MySQL circa 2007.)

More importantly, when you’re laying out the project, lay it out in a way that’s familiar to other users. Use a similar layout and even possibly a similar naming scheme for the basic components. Rust does some of this, with it’s ‘lib.rs’ and ‘/bin’ conventions, but to really meet other people where they live, you have to go further. Anticipate and meet their mental models somewhere in the middle, and you’ll have them hooked into downloading, understanding, and maybe even participating in your project’s vision.

And when you can’t, when what you’re doing is so new that few popular models exist, it’s incumbent upon you to document both the why and the win of your method. It’s not enough to say why you’re doing something, you have to explain the win of doing it your way so clearly that people are willing to struggle with the code to understand the new thing that you’re doing and wrap their heads around your new ideas.

Like any story, your code base has a premise, both a reason for existing and a plot by which you brought it into the world. Make those premises explicit and familiar, and your project will have an easier time finding users and contributors.

I’m 53 years old and still consider myself a software engineer who’s professional job isn’t writing code, it’s producing value using coding skills. I took a year off to learn recently to go back to school, and trying to find a job when I’m 53 turned out to be a challenge. I do think I encountered a lot of ageism in the interview process, but I also think that taking a year off left me vulnerable to skill rot.

Fortunately, I did find a job, and my current employer looked past the rustiness to see that I could do the job he needed me to do. And this week I had an experience that, well, frankly, has me going, "To hell with the ones who didn’t hire me."

In my last job at Splunk, I wrote a lot of Go, a lot of OpenAPI, a little bit of Docker, a little bit of Kubernetes, a little bit of AWS, and a modest amount of React. In previous jobs, I had written tens of thousands of lines of Python, Javascript, CSS, and HTML, as well as shell scripts, Perl, C/C++, even some Java, along with all the maintenance headaches of SSH and so forth.

In my year off I spent my time taking classes in low-level systems programming: Programming Languages and Compilers, and I have a half-finished implementation of a PEG parser based on an obscure algorithm for parsing and lexing that nobody has ever fully implemented in a systems language. I chose Rust as my language of choice because its performance is close to C++’s and its compiler choices are sensible and helpful.

In my new job, part of my task involves helping the team "level up." They’re mostly SDET-1s (Software Development Engineering in Test, level 1) producing tooling for the Quality Assurance teams, who spend a lot of their time playing games (it is a gaming company) and then reporting on any problems they found. The QA folks aren’t the sort to be working in JIRA or Bugzilla, they needed a straightforward tool that understands their needs.

The tool they’ve built is in Django, which is great for prototyping, but it’s now hitting its limits. Meanwhile, we’re all "moving to the cloud," as companies seem to like to say these days, and my current big side project is to write a demo on what a cloud-based solution looks like. I decided to take two pages from the existing app (the login page, and the list of existing reports) and make them work.

I spent two to four hours a day over eight days, and in those eight days here’s what I have learned (and hopefully will document soon):

  • React Hooks
  • React Context
  • Material UI
  • Material themes
  • Golang’s LDAP interface
  • Golang’s Postgres interface
  • CORS
  • JWT & PASEO
  • Multi-stage Docker builds
  • Docker swarm
  • Docker stack

I knew a little bit about each of these. I did the demo yesterday, using live data, and jaws dropped. "That was fast" (It’s not Django) and "Have you seen our tooling? That looks better than anything else! It looks like us." (I found the company style guide and applied the colors and fonts to the Material Theme engine). Then the reveal: The Django app container is 2.85GB, but my project fit into 37MB; an automated security scanner screams in horror when examining the existing project, but mine had only one error: it’s a demo, so I didn’t have a "good" SSL certificate.

Some of these things (like multi-stage builds) I’d only ever heard about. But the simple fact is that none of this is particularly new. Hooks just wrap ordinary React; React.Context is just React.State writ large. Material is a just a facade, themes are just well-organized CSS styles, Golang’s interfaces are a lot like Python’s or Rust’s, LDAP is just a query engine with an annoying call sequence, JWT is just Web Cookies with some rules. Swarm and Stack are variations on Kubernetes and Compose. Multi-Stage builds were… Makefiles, "but, like, for the cloud, dude."

In almost every case I already had a working mental model for each feature, I just needed to test and see if my model was accurate and adjust accordingly. Some things are still fuzzy– docker networking is fun and seemingly sensible, but I haven’t memorized the rules yet.

I can contrast this with my academic pursuits, like Rigged Regexes and Seam Carving, which are both taking forever. Part of the reason they’re taking forever is that they involve genuinely new ground: no one has produced a systems-language implementation of Brzozowski’s Algorithm, and I have yet to find a multi-threaded, caching implementation of the Seam Carving algorithm for a low-level Unix environment like ImageMagick or NetPBM. (Another part now is that, hey, I’m working full time; I get like 30 minutes every day to work on my Github projects, usually during the commute.)

In short, the people who didn’t hire me didn’t hire me because, if I were a construction worker, they were an all-Makita worksite and I owned a DeWalt drill.

Which is absolutely nuts. But that’s the way this industry works.

At the ${NEW_JOB}, we’re going through the processing of leveling up, and one of the steps of “leveling up” as a software company is watching Robert Martin (aka Uncle Bob’s) Clean Code Video Series, where he presents his ideas about how code should be thought about, organized, and written. A lot of the material is okay, and there’s definitely something to be said for his ideas about separation of concerns.

In the latest one, for example, he said that when reviewing the design of a system, it should shout what it’s about. A payroll system should shout “Payroll!” Frequently, though, it doesn’t; the organization of a project in a modern framework is often about a framework. Django apps are organized around the needs of the deployment scheme, and so are Rails and so forth. (WebObjects, the original web framework from 1996, is said to have been modeled directly on Trygve Reenskaug’s 1979 paper on model-view-controller, so I wonder if it’s different.)

Bob’s organizational scheme is that there are Entities, which contain knowledge, Interactions which handle the events that cause entities to change, and Boundaries that present the the permissible set of events to end users. “The web stuff” should only talk to boundaries, and should be separate from the entire system; it should be possible to swap out a web-based presentation mechanism from the entire system and scale and price both separately.

This is a great idea, and one I subscribe to whole-heartedly. (I also love that it’s inherent in the way Rust does CLI programs as “libraries first”; that’s not in the compiler, but every example did it that way and the community has learned to do it that way.)

But there’s one thing about Bob’s scheme that’s been bugging me. And I think I know what it is. Take a look at this:

See that blue arrow pointing at the connector between the two objects? What is that?

“Well, it’s a function call. Duh.”

Great. What is it? What is a function call?

You see, in Bob’s world, a function call is something he never has to worry about. Compilers just do all that stuff for you. A compiler sets up space on the stack for the arguments to the composed object, as well as empty space for the returned value. All the complicated underlying stuff about allocating memory, protecting it from overwrites, reclaiming it when you’re done with it, all that stuff has been elided from Bob’s reasoning, and from yours and mine.

And that’s fantastic. With the exception of some extreme edge cases, usually suffered by the people who write database cores and operating systems for a living, we no longer need to worry about a lot of machine details.

Until we move to The Cloud.

Look at that blue arrow again. Now imagine that, instead of a function call where all the details about the ABI (application binary interface) are hidden under a warm comforting blanket called the CLR or the JVM or dot-DLL or dot-SO or whatever, they’re REST or gRPC or whatever network interface you want to imagine. They need to be protected from outside prying eyes by TLS-hardened pipes, walled off by private networks, secured with JWT or PASEO, operated inside Docker containers and managed by Kubernetes.  It needs to be “stateless,” meaning that it needs to be able to die and restart and recover immediately where its previous incarnation left off.

You can still write the core of your system using Uncle Bob’s UML-inflected notations about the relationships between objects. (That said, I find object orientation to sometimes be a fetish. Sometimes the elegant implementation is just a function.) But the lines between Entities, Interactions, and Boundaries are no longer handled for you by friendly compilers. You have responsibility for them.  Each object in the system, or some cluster of objects, needs to wrapped in layers and layers of security (since “the cloud” is just other people’s computers you happen to be renting), performance monitoring, and recovery management.  Each object needs you to manually specify the interfaces between them, usually using something like Swagger or OpenAPI or some hand-turned REST thing for which no documentation exists.

Sometimes I think this is where cloud-oriented programming has gone terribly wrong. We dove into this highly performant and redundant system without thinking harder about how we could achieve the ease of use toward which that blue arrow once pointed.

David J Prokopetz asks, “What’s the most asinine technical requirement you’ve ever had to deal with?” I don’t know if it qualifies as “asinine,” but one of the most idiotic requirements I ever ran into during an assignment was simply one about contracts, money, and paranoia about the GPL.

It came down to this: in 1997, the Seattle-based company I was working at had been “acquired” by CompuServe (CIS) to be CIS’s “Internet division,” and as part of the move we were required to move to RADIUS, the Remote Access Dial-In User Service, an authentication protocol for people who dialed into an ISP using their landlines, so that our customers could dial in through CompuServe’s banks of modem closets. That was fine.

What wasn’t fine was that, at the time, CompuServe’s central Network Operations Center (NOC) in Columbus, Ohio, was 100% MicroSoft NT, and we were a Sun house. The acquisition required a waiver from Microsoft because CIS was getting huge discounts from MicroSoft for being a pure MS play. We were told that, if we had to run on Solaris, then we also had to run a pair of RADIUS servers written for NT and ported to Solaris, and we also had to run a pair of Oracle servers (CIS had a lot of contractual obligations about who they purchased software from as a result of their NT centricity), and in order to make them line up we also had to buy these ODBC-on-Solaris shims that would let our ODBC-based RADIUS servers talk to Oracle, despite all of this running on Solaris.

So we had four machines in the rack, two running this RADIUS hack and the ODBC drivers, and two running Oracle. Four machines and the software alone was $300,000.

And it crashed every night.

“Yeah, it’s a memory leak,” the RADIUS server vendor told us. “We’re aware of it. It happens to NT too. We’ll get around to fixing it, but in the meantime, just reboot it every night. That’s what the NT people do.”

Now, at the time, there was a bit of pride about Unix programmers: we don’t reboot machines unless lightning strikes them. We could refresh and upgrade our computers without trauma without having to engage in therapeutic reboots. We had uptimes measured in years.

The counterpoint is that there was a GPL-licensed RADIUS server. We were allowed to use GPL-licensed code, but only under extremely strict circumstances, and in no case could we link the GPL-licensed RADIUS server to Oracle. That was a definitive ‘no.’ We had to use the ones CompuServe ordered for us.

So Brad, my programming buddy, and I came in one weekend and wrote a shim for the RADIUS server that used a pair of shared memory queues as a full-duplex communications channel: it would drop authentication requests into one, and pick up authentication responses in the other. We then wrote another server that found the same queues, and forwarded the details to Oracle over a Solaris-friendly channel using Oracle Pro*C, which was more performant and could be monitored more closely.

We published the full-duplex-queue for the RADIUS server, which was completely legit, and legal let it go without wondering why we had written it.

A couple of months later my boss calls us in. In his fine Scottish brogue he says, “I haven’t seen any case reports coming out of the RADIUS server in a while. I used to get one a week. What did you do?”

Brad and I hemmed and hawed, but finally we explained that we’d taking the GPL radius servers and put them on yet another pair of Solaris boxes, in front of the corporate ones. We showed him the pass from legal, and how we’d kept our own protocol handler in-house and CIS IP separate (he was quite technically savvy), and how it was ticking over without a problem and had been for all this time.

“But we’re using the corporate servers, right?” he asked.

“Oh, sure,” I said. “If ours ever stops serving up messages, the failover will trigger and the corporate RADIUS boxes will pick up the slack.”

He nodded and said, “Well done. Just don’t tell Columbus.”

Ultimately, we had to tell Columbus. A few months later CIS experiened a massive systemic failure in their RADIUS “wall,” a collection of sixty NT machines that served ten times as many customers as our foursome. Brad and I were flown to Columbus to give our presentation on why our system was so error-free.

After we gave our presentation, the response was, “Thanks. We’d love to be able to implement that here, but our contract with MicroSoft means we can’t.”

There are many reasons CompuServe isn’t around anymore. This was just one of them.

One of my biggest roles at my new job is mentor, and I’ve quickly learned a five-word phrase that my peer-students have come to understand and respect: "I don’t know that yet."

The team I’m mentoring consists mostly of QA people who are moving into SDET roles, with a healthy side of SDET tooling development. I did a lot of mentoring at Isilon, and I did a lot of SDET tooling at Splunk, but I don’t think I did both at the same time so this has been interesting. There are two groups inside this team: one is developing reporting tools using Django and React, the other is developing DevOps tools using Docker and Python.

I know both of these worlds very well, but having been out of both of them for three years, two because Splunk needed me to write a lot of Go and Kubernetes and one because I wanted to learn Rust and systems programming, I’m having to reboot those skills.

The team knows this, but they’ve learned that "I don’t know that yet" from me doesn’t mean I can’t know the answer. What it means is that I will come up with it quicker than they will. Often I go back to my desk, type in some magical search term, and walk back saying, "Search for this. We’ll find the answer in these keywords."

Experience at my stage isn’t about knowing things; it’s about knowing how to know. This is what I’m trying to teach them, more than anything else. How do you search for the answer, and how will you incorporate it into your code? I’ve been a little more harsh on the Docker people; they’re in the habit of cutting and pasting from Stack Overflow without really understanding the underlying mechanisms, and so I ask a lot of questions about "I’m curious, how do you expect this works?" and "I’m curious, why did you choose this?"

By the way, that phrase, "I’m curious," is super useful; it creates a consensual context for a question that has no "bad" answers, only ones with differing degrees of correctness, and sets the person answering it up for success. Learn to use it.

I spend a lot of time walking around the work "pod," answering questions. A do a lot of code reviews where I say things like "The next person to look at this, who might even be you, might have no idea why this is like this. Please add a comment explaining why you chose to do it this way" and "Could you please include the JIRA ticket in the pull request?"

This is the biggest lesson I’ve learned in my second major mentorship role: it’s not just okay to say "I don’t know," it’s necessary. I seem to know everything: from kernel modules to the basics of graphic design and CSS. I quote Captain Kirk a lot: "You have to learn why things work on a starship." (We will ignore the fact that the cryptographic key to every Federation Starship’s command console is a five-digit PIN with no repeating digits; although I would hope that was just the password to the real key and requires executive biomarkers just to type in!)

"I don’t know that yet." indicates vulnerability. I don’t know everything. They respect that. Better, they trust when I give them answers, because they trust that I do know what I’m talking about at those times. The "yet" is also a huge part of the lesson: "I don’t know, but I’m going to find out." And when I do, I’ll share it, document it, and keep it for other people.

So be fearless about admitting ignorance. If you’re a greybeard developer like me, your number one skill should be learning faster than anyone else. That’s it. You know how to know. You know how to learn. When someone on the team is lost, you know how to find the path, and how to show them where it is. That’s what they’re paying you for: the experience needed to know how pull other people up to your level.

The big news

So, the big news: I’ve been hired!. This is a huge relief. I was worried for months that, having taken a year off for schooling, I was going to be out-of-phase and out-of-date with respect to the industry and of no damn used to anybody. Being over fifty also means that I’ve got that “aged out of the industry” vibe all over me, and while that’s a real thing I know plenty of greybeards in this industry who manage to work hard and keep up.

It turns out, there are plenty of places that need someone with my skills. And so I’ve landed at Big Fish Games, where my job is to take all the enterprise-level experience I’ve developed over the past twenty years and teach their QA Infrastructure team a few simple but not easy lessons:

  • There’s more than one way to do this
  • Think more broadly
  • Be more daring
  • Taste and judgement matter

A lot of that has involved showing rather than telling. I’ve been doing a lot of code pairing exercises where I end up asking questions like, “What does the system really care about?” and “Is nesting components a recommended solution in this case?” and “Is there a linter for this?” Every little step makes them better and faster and, frankly, it’s one hell of a lot of fun to blow the rust off my web-development skills. In the two weeks I’ve been there I’ve written code in Perl, Sed, Haskell, Javascript, Python, and Bash.

Project Status

Obviously, I’m not going to be as productive as I was during my time off. Even with the classes and family obligations, I managed to get a few things done in the nine months I was taking classes.

I’m still hacking on the PamSeam thing, so let me recap and maybe I can explain to myself how the AviShaTwo algorithm works.

A seam, in this context, is a wandering path of pixels either from top to bottom or left to right, in which each pixel is connected to one of the three pixels below it, and the total path is judged by some algorithm to be the least destructive to the visual quality of the image.

In the image here, we say that the surfer and the white foam are highly “energetic”; removing them would damage the image. So we try to find seams of calm sea and remove them first. The two images underneath represent the results from the paper, and my results implimenting AviShaOne.

AviShaOne

AviShaOne (Avidan & Shamir) is pretty easy to understand. For every pixel, we calculate the difference between the left and right and top and bottom hue, saturation, and value, and come up with an “energy” for every pixel. (Pixels on the edge just use themselves instead of a neighbor).

The clever part is realizing that, after the first row, for every pixel on a row it had to be reached by one of the three pixels above. So rather than calculate every possible path through the image, AviShaOne says, “For this pixel, which of the above pixels contributes least to the total energy of the seam to which this pixel belongs? Record the total energy of the seam and the pixel above that contributed.” Because that pixel has to belong to a seam including one of those three. We end up at the bottom with an array of seams and total energies, pick the lowest energy, and then trace that seam’s progress back up to the top.

Now you have a column of row offsets that can be used to carve that seam out, reducing the image’s width by one.

Interestingly enough, the YLLK (Yoon, Lee, Lee & Kang) algorithm proposes modifying AviShaOne to not use all the energy of the pixel, but to bias the energy calculation along the perpendicular of the seam, that is, for a vertical seam to make the horizontal differences more signficant than the vertical ones. YLLK demonstrate that AviSha’s algorithm using all the energy around the pixel as the metric of comparison can cause serious horizontal distortion when removing vertical seams from an image with strong vertical components (such as a picket fence). Removing the vertical information biases the preservation of vertical image elements by giving the algorithm only the horizontal changes. I may have to experiment with this.

AviShaTwo

AviShaTwo is a little different. In AviShaOne, we calculate the seam that has the lowest energy, the lowest difference between its neighbors, before removing it. AviShaTwo asks us to consider, “after we remove it, will the new pixels pushed together be compatible with each other?” Since we’re doing this backwards, we look at the three pixels above and ask, “If we removed that and the current pixel, what would the resulting energy between the new neighbors be?” We then pick the parent that creates the least energy change, as that creates the seam that does the least damage. This solution is called the “forward energy” algorithm because it looks forward to the results, rather than backward from the expectations.

The edge cases remain a serious pain in the neck to manage.

Challenges

Directionality

This is a strange problem. The basic interface for carving a seam is two functions: carveVerticalSeam:: Image -> Image and carveVerticalSeam:: Image -> Image Internally, these functions and their supporters look so damn close to one another that I can’t imagine why I need two of them other than that I’m just not clever enough to come up with an algorithm that maps one directionality to another.

Speed.

The algorithm is slow. The energy map has to be recalculated every single time since the seam you removed overlapped with other seams, meaning the energy of a large part of the image has changed.

There are two possible solutions:

Caching

It might be possible to cache some of the results. Starting from the bottom of the removed seam, we could spider up every left and right seam in the derived seam map, creating a left-to-right or top-to-bottom range of seams that we know are in the damaged list. We could terminate a single scan early for those scans that we know cannot possibly reach beyond the current range; it’s entirely possible that the upper half of an image results in no scans at all. We then remove the seam from the energy and seam maps, move the remaining content upward or leftward, and record the range of seams that needs recalculation in case the client asks for another seam.

Whether this is actually a win or not remains a subject of investigation.

One big problem with caching, though, is about reducing an image in two dimensions simultaneously. We’d need to maintain two maps: one representing the vertical seams, and one representing the horizontal seams. Negating and mutating both of those after each carve might end up costing more in processing time and memory than it was worth.

Threading

Threading could speed up the algorithm by a linear amount in terms of how many CPU cores you happen to have lying around (or are willing to let the algorithm use). But threading has several problems.

The first is edge cases… again, literally. Let’s say we’re looking for the best vertical seam. We break the image up into columns, and then each thread gets one column. But each row has to be completed before the next row can be addressed, because each row is dependent upon the previous row for its potential parent energy values. That’s a lot of building up and tearing down threads. I wonder if I can build a thread pool that works on a row-by-row basis, and a queue generator that builds out “work a row” jobs in chunks.

There’s also a weird internal representation issue associated with directionality. Computer memory is only one-dimensional; a two-dimensional image is a big block of memory in which each row is streamed one after another. We can break a target array up into read/write chunks for row-based processing; we can’t do the same thing for column-based processing, as the in-memory representation of columns is “a stream of bytes from here, and then from over there, and then from over there.”

If I could solve the directionality part of the problem, then the working maps (the local energy map and the seam/parent map) could represent the columns as rows by rotating the working map 90 degrees. That still doesn’t obviate the need to cost the cache management operations.

Extensions

There are a number of alternative algorithms that may create better seams. The YLLK paper introduces more sensitive energy algorithms to preserve directional features, and there’s a paper that has a different energy algorithm that supposedly creates better averaging seams for upscaling, although that one’s in Chinese (but there’s a power point presentation in English with the bulk of the algorithm explained). There are machine-learning extensions to absolutely score seams that will pass through faces to remove them. And there are lower-resolution but much faster (like, real-time, video-editing faster) algorithms that I’m probably not all that interested in.

Subscribe to Feed

Categories

Calendar

January 2020
M T W T F S S
« Dec    
 12345
6789101112
13141516171819
20212223242526
2728293031