Clay's blog http://softwarecommunity.intel.com/ISN/Community/en-us/blogs/multi-core-thredmonkey/archive/2006/12/18/30228042.aspx asks if Intel® Threading Building Blocks [Intel® TBB] is a solution looking for a problem. OpenMP is great if you have Fortran code, or C code that looks like Fortran, or C++ that looks like Fortran. In other words, flat do-loop centric parallelism.
I admit I just don't get it. I'm not a C++ programmer, so the preface to any response you might want to make to my criticisms and questions posed here could be "Clay, you ignorant slut!" I can accept that; I'm speaking, to some extent, from a position of ignorance.
While my wife and I were watching House, M.D. the other night, we were doing shots of Basil Hayden's every time one of the characters said "Hi, Bob." (I must confess, we got a lot more hammered playing this game when Bob Newhart was still on the air.*) We've both been big Hugh Laurie fans from his work in Blackadder, A Bit of Fry and Laurie, and
I recently updated my video game Frequon Invaders. It's a free download from http://home.comcast.net/~arch.robison/frequon.html , which is strictly my own product, not Intel's. In doing the update, I optimized it for Intel® Core™2 Duo processor, and ran into a tale of dependence breaking that I'll tell here.
Welcome to the first installment of "What's Not Parallel!"
(You were supposed to yell out the name as you were reading, like they do at the opening of "Wheel of Fortune". Please go back and try again. Thanks.)
The first example of something that is not able to be made parallel are algorithms, functions, or procedures that contain a state. That is, something that is kept around from one execution to the next. For example, the seed to a random number generator or the file pointer for I/O would be the state.
The first thing that popped into my head as I sat watching the denouement of War Games was "Why is it taking so long for WOPR to exhaust all of those tic-tac-toe combinations?" I mean, there are only 362,880 (9 factorial) different games that could be played, and that counts all the games that are just the reflection or rotation of another game.
The parallel loop templates in Intel® TBB require a grainsize parameter. Ideally, we'd have some sort of profile-guided optimization. But that's tough to do within TBB's goal of working with standard-issue compilers.
It's really not that difficult to understand and set. I had this analogy in a draft of the Tutorial, but it ended up on the cutting room floor because it depended too much on understanding the Western culture's lifestyle:
At times it seems that Intel is playing the part of Chicken Little (from the fable, not the movie) by running around endlessly pushing the "Multithreading is necessary!" mantra.
The other day, Henny Penny, Cocky Locky, and I were rushing around chatting up all the advantages of threading applications to Turkey Lurkey, Goosey Poosey, and anyone else that would listen. When we passed by the Little Red Hen's house, I stopped to smell the freshly baking bread. As I paused there, I remembered the old saying "If all you have is a hammer, everything looks like a nail."