Intel® Accelerate Your Code Summer 2013
In this contest we were given the sample source code, on which we could base our solution. This sample code had a number of issues:
1. The program is not fast enough
2. Only pixel-exact pattern transformations are found
3. The program is not taking advantage of parallelism
We will go through these issues one by one, describing how they were solved in our solution.
Issue 1. The program is not fast enough
One of the first and most simple things I did to improve the given approach, was moving from pattern upscale to image downscale. The sample code was upscaling the patterns and trying to match them with a region on an image. The same matching can be done if we downscale the image, but this would be much faster (working on smaller images is always faster). But do we lose any precision while downscling the image? Sure, we lose some information about initial image, but if we upscale the pattern we lose much more, since we’re “making up” some information. So if you think about it, comparing downscaled image with a pattern is even more precise than comparing upscaled pattern with original image.
Issue 2. Only pixel-exact pattern transformations are found
Now this part is tricky. The biggest issue is that we were not given the mathematical definition of accepted pattern transformations. And it’s pretty obvious the very general problem cannot be solved. For my solution I decided to design the algorithm that would work on presented tests and similar ones. The transformations found in tests are scaling and rotation. Both operations have a floating point parameter (scale can be 1.33x, and rotation can be 27 deg), so we can’t just go through all the possible ones. And this means, we can’t use pixel-wise pattern matching.
For my solution I decided to use a form color histogram. Even more, a number of color histograms one for each different fixed size part of pattern (let’s say 5x5 pixels). This approach allowed me to find patterns in nearby scales (say 2.1x can match 2x) and nearby angles (say 40 deg can match 45 deg).
Naturally, this is much slower than just comparing pixels, so a number of optimizations were made to optimize this. The first big optimizations is using integral image for histogram calculation, and the second is fast rejecting some regions before comparing all histograms.
Issue 3. The program is not taking advantage of parallelism
Well, this one was pretty easy for me, since I had some experience in using Intel TBB, and Xeon Phi allows you to use it. I made all time consuming sections parallel, and that allowed me to pass all the presented tests on server!
You can find all the source code attached.