Image Composition Algorithm Trade-Offs
Over the past week, I've been rather lost in the weeds trying out different alpha mask and image composition algorithms. It all came to a flash point when I formalized the code that builds the final textures for my jigsaw puzzle pieces.
For the most accurate-looking results, it was taking about a half-second for each piece using the algorithms I'd built up to that point. This wasn't going to be acceptable -- for a 100-piece puzzle, this would mean asking the player to wait for about a minute while it prepares the puzzle.
I went through several steps to try to improve performance. I thought I'd review my experience here.
Pixel Erosion as a Morphological Algorithm
Pixel erosion is the process of taking an existing image or alpha mask, finding its edges, and either collapsing them inward (deflation) or outward (inflation).
My original algorithm was a naive approach, which involves applying one pixel's worth of erosion against the entire image at a time. It scans the entire image, using either a box-shaped or cross-shaped "stencil" to find out if each pixel should be included in the final result. This can be fine if you only plan to erode one or two pixels, but its performance will degrade quickly if you need to erode many pixels. It's effectively O(I * W * H), where I is the number of pixels you want to erode.
The choice of the naive pixel erosion algorithm was the biggest bottleneck, partially because I was originally performing it in super-sampled space in order to get the best-looking anti-aliasing results.
I went ahead and explored an alternate pixel erosion algorithm using a chamfer distance field. This algorithm requires some extra memory, as you have to first compute the distance to the nearest edge for every pixel. But it allows you to perform the erosion pass against any number of pixels in a single pass, making the algorithm O(W * H).
Using a chamfer distance field reflected a significant performance enhancement, going from about a half-second down to a tenth of a second per puzzle piece. But this still isn't great, as I'd still be asking players to sit and wait for a good 10 - 15 seconds while the puzzle "loads".
A big part of the performance bottleneck comes from applying these algorithms in super-sampled space. You can go from 4x4 super-sampled space down to 2x2, with some loss in quality for anti-aliasing. The performance gain here is significant, but I still found it didn't quite meet my expectations.
Pixel Erosion Approximations
At this point, I decided to try some alternative ways to approximate pixel erosion:
- Geometry deformation: Take the original outline of my polygon, and collapse it inward. The most accurate approach is to use Offsetting. This gets complicated with complex polygons, so I approximated this by just calculating the normal vector for each vertex, and pushing them all inward. The results here can either be surprisingly good, or can totally mangle your geometry. Libraries like Clipper will handle all the edge cases to prevent mangled geometry.
- Raster scan line deformation: Take the results of fill rasterization, and just offset it in both the horizontal and vertical directions.
Raster scan line deformation turned out to be the solution to my pixel erosion performance problems. The idea is pretty simple: fill rasterization already involves turning geometry into sets of "pixel spans" for either every row, or every column, of a final image. This is an intermediary representation before turning it into an alpha mask. So you just take all of those pixel spans and make them N pixels smaller.
This changes the algorithm from operating against a million pixels, to operating against a few hundred pixel spans, for every puzzle piece. The visual quality can be a little bit degraded, but is well within reason for what I'm trying to achieve.
Thoughts for the Future
All of this took me from creating a jigsaw puzzle game, into the territory of building a general-purpose image compositing library for a few weeks. It's been a bit of a side-quest, but it has also given me an appreciation for what image processing applications like Gimp and Photoshop have to go through.
It's also allowed me to build up a few algorithms that I might use for other purposes, even if they've effectively been sidelined for this jigsaw puzzle game. I mentioned in a previous post that I plan on eventually building a C++ indie games library. One of the goals of that library will be to provide a wide cross-section of algorithms for achieving similar results:
- If you're building a tool where you want to create high-quality textures with a variety of effects applied, you have high-quality but slower algorithms available.
- If you have well-behaved geometry, you have simple geometric deformations available.
- If you're doing something in-between, like my puzzle game where I want to bake textures dynamically at the start of each puzzle, you have algorithms that aren't quite real-time, but fast enough for an initial load.