"Vanilla is the Absence of Chocolate" and Other Lies My Father Told Me

By Clay Breshears (Intel) (124 posts) on July 22, 2008 at 2:52 pm

If you've ever taken a course on algorithms, you will recall that the goal of this course is to explore different techniques to perform certain computations and to be able to evaluate an algorithm's efficiency in a logical and mathematically rigorous way. During my CS studies I took this course and I've taught the algorithms courses while at the University of Southern Mississippi (both a serial [undergraduate] and a parallel version [graduate]). 

In these courses, we indoctrinate our students with the idea that the better choice for algorithms is the one with the lowest asymptotic upper bound (the big-O notation) on the growth of the number of algorithmic steps executed. Certainly, on average, Quicksort, O(n log n), will be faster than BubbleSort, O(n^2); binary search, O(log n) is better than linear search, O(n), and a hash table search, O(1), is better than either. 

Now, however, with parallel programming, we may need to have our students rethink what the best (serial) algorithm is to be used for parallel execution. It may not be the one with the best asymptotic behavior, but the one that is easiet to code in parallel. For example, consider matrix multiplication.  One of the best algorithms devised to date is Strassen's Algorithm. This algorithm quarters the matrices and recursively computes 7 different multiplications of these pieces and combines the results into the final answer. The algorithm uses O(n^2.81) operations where the traditional triple-nested loop algorithm is O(n^3). In parallel, Strassen's algorithm would require some tricky synchronization and concurrency management issues. To parallelize the traditional algorithm I only need to add a single OpenMP pragma. 

Better serial algorithms may make use of some shared state to cut the number of steps needed to perform a computation. As the number of cores increases, the cost of computation goes down whereas the cost of synchronization goes up. Thus, utilizing the shared information of a better algorithm is going to be a major cause of lower parallel performance. Alternately, running a less than optimal algorithm on a myriad of cores with minimal synchronization would be best. Another example is to perform redundant work on each core in order to compute shared values locally rather than stalling execution while one core does the computation before sharing the results with all the other cores.

I'm not predicting the end of Algorithms courses as we know them today. They are still a great place to teach how, given an algorithm not seen before, to estimate the execution time and a place where new algorithmic ideas can be presented in order to equip future programmers for when they need to write code. No, I'm just trying to give a gentle "heads up" to the time when a student gives you a blank stare as you proceed to use a sub-optimal algorithm in some parallel coding example and doesn't immediately see why. She may suspect that the whole Algorithms course is a sham and just an excuse to get more tuition from her. When that happens, you might suggest she go think it over while sipping a chocolate shake. 

It's what my father would do. And then he'd tell me that vines would grow from my stomach and out my ears if I swallowed a watermelon seed.

Categories: Academic, Parallel Programming, Software Engineering

Comments (3)

July 22, 2008 3:56 PM PDT

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
The above supports Rule 8 ("Don't be afraid to change the algorithm for a better chance of concurrency") from my "8 Simple Rules for Designing Threaded Applications". Click http://software.intel.com/en-us/videos/clay-breshears-eight-..... aded-apps/ for a shameless video plug.
August 3, 2008 10:19 AM PDT


Carole Malvern
The vines _will_ grow out of your gut and shoot out your ears. Your dear old dad was right as rain. Don't be teaching the young children any different. Received wisdom is _still_ wisdom. ;-)
August 7, 2008 9:44 AM PDT

Clay Breshears (Intel)
Total Points:
15,225
Status Points:
15,225
Black Belt
Yes, and I wish I had had children to wisdom over. I've only got my nieces and nephew, but don't see them enough to pass along the nuggets needed.

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*