The big news
So, the big news: I've been hired!. This is a huge relief. I was worried for months that, having taken a year off for schooling, I was going to be out-of-phase and out-of-date with respect to the industry and of no damn used to anybody. Being over fifty also means that I've got that "aged out of the industry" vibe all over me, and while that's a real thing I know plenty of greybeards in this industry who manage to work hard and keep up.
It turns out, there are plenty of places that need someone with my skills. And so I've landed at Big Fish Games, where my job is to take all the enterprise-level experience I've developed over the past twenty years and teach their QA Infrastructure team a few simple but not easy lessons:
* There's more than one way to do this * Think more broadly * Be more daring * Taste and judgement matter
I'm still hacking on the PamSeam thing, so let me recap and maybe I can explain to myself how the AviShaTwo algorithm works.
A seam, in this context, is a wandering path of pixels either from top to bottom or left to right, in which each pixel is connected to one of the three pixels below it, and the total path is judged by some algorithm to be the least destructive to the visual quality of the image.
In the image here, we say that the surfer and the white foam are highly "energetic"; removing them would damage the image. So we try to find seams of calm sea and remove them first. The two images underneath represent the results from the paper, and my results implimenting AviShaOne.
AviShaOne (Avidan & Shamir) is pretty easy to understand. For every pixel, we calculate the difference between the left and right and top and bottom hue, saturation, and value, and come up with an "energy" for every pixel. (Pixels on the edge just use themselves instead of a neighbor).
The clever part is realizing that, after the first row, for every pixel on a row it had to be reached by one of the three pixels above. So rather than calculate every possible path through the image, AviShaOne says, "For this pixel, which of the above pixels contributes least to the total energy of the seam to which this pixel belongs? Record the total energy of the seam and the pixel above that contributed." Because that pixel has to belong to a seam including one of those three. We end up at the bottom with an array of seams and total energies, pick the lowest energy, and then trace that seam's progress back up to the top.
Now you have a column of row offsets that can be used to carve that seam out, reducing the image's width by one.
Interestingly enough, the YLLK (Yoon, Lee, Lee & Kang) algorithm proposes modifying AviShaOne to not use all the energy of the pixel, but to bias the energy calculation along the perpendicular of the seam, that is, for a vertical seam to make the horizontal differences more signficant than the vertical ones. YLLK demonstrate that AviSha's algorithm using all the energy around the pixel as the metric of comparison can cause serious horizontal distortion when removing vertical seams from an image with strong vertical components (such as a picket fence). Removing the vertical information biases the preservation of vertical image elements by giving the algorithm only the horizontal changes. I may have to experiment with this.
AviShaTwo is a little different. In AviShaOne, we calculate the seam that has the lowest energy, the lowest difference between its neighbors, before removing it. AviShaTwo asks us to consider, "after we remove it, will the new pixels pushed together be compatible with each other?" Since we're doing this backwards, we look at the three pixels above and ask, "If we removed that and the current pixel, what would the resulting energy between the new neighbors be?" We then pick the parent that creates the least energy change, as that creates the seam that does the least damage. This solution is called the "forward energy" algorithm because it looks forward to the results, rather than backward from the expectations.
The edge cases remain a serious pain in the neck to manage.
This is a strange problem. The basic interface for carving a seam is two functions:
carveVerticalSeam:: Image -> Image and
carveVerticalSeam:: Image -> Image Internally, these functions and their supporters look so damn close to one another that I can't imagine why I need two of them other than that I'm just not clever enough to come up with an algorithm that maps one directionality to another.
The algorithm is slow. The energy map has to be recalculated every single time since the seam you removed overlapped with other seams, meaning the energy of a large part of the image has changed.
There are two possible solutions:
It might be possible to cache some of the results. Starting from the bottom of the removed seam, we could spider up every left and right seam in the derived seam map, creating a left-to-right or top-to-bottom range of seams that we know are in the damaged list. We could terminate a single scan early for those scans that we know cannot possibly reach beyond the current range; it's entirely possible that the upper half of an image results in no scans at all. We then remove the seam from the energy and seam maps, move the remaining content upward or leftward, and record the range of seams that needs recalculation in case the client asks for another seam.
Whether this is actually a win or not remains a subject of investigation.
One big problem with caching, though, is about reducing an image in two dimensions simultaneously. We'd need to maintain two maps: one representing the vertical seams, and one representing the horizontal seams. Negating and mutating both of those after each carve might end up costing more in processing time and memory than it was worth.
Threading could speed up the algorithm by a linear amount in terms of how many CPU cores you happen to have lying around (or are willing to let the algorithm use). But threading has several problems.
The first is edge cases... again, literally. Let's say we're looking for the best vertical seam. We break the image up into columns, and then each thread gets one column. But each row has to be completed before the next row can be addressed, because each row is dependent upon the previous row for its potential parent energy values. That's a lot of building up and tearing down threads. I wonder if I can build a thread pool that works on a row-by-row basis, and a queue generator that builds out "work a row" jobs in chunks.
There's also a weird internal representation issue associated with directionality. Computer memory is only one-dimensional; a two-dimensional image is a big block of memory in which each row is streamed one after another. We can break a target array up into read/write chunks for row-based processing; we can't do the same thing for column-based processing, as the in-memory representation of columns is "a stream of bytes from here, and then from over there, and then from over there."
If I could solve the directionality part of the problem, then the working maps (the local energy map and the seam/parent map) could represent the columns as rows by rotating the working map 90 degrees. That still doesn't obviate the need to cost the cache management operations.
There are a number of alternative algorithms that may create better seams. The YLLK paper introduces more sensitive energy algorithms to preserve directional features, and there's a paper that has a different energy algorithm that supposedly creates better averaging seams for upscaling, although that one's in Chinese (but there's a power point presentation in English with the bulk of the algorithm explained). There are machine-learning extensions to absolutely score seams that will pass through faces to remove them. And there are lower-resolution but much faster (like, real-time, video-editing faster) algorithms that I'm probably not all that interested in.