I was wondering if anyone could help me with compiler options or something that would help increase the performance of TBB on Mac OS.
I have a very simple test case that I've been using as a benchmark for my cross platform thread development. I have a large array (can be as large as 128MB) that I loop through and take the square root of each element and store it in a separate array.
I have tested this on an AMD dual core machine running Windows (with MSVC as my compiler) and I get nearly 2x speedup over serial for 2 (or more) threads, which is what I'd expect. The same code on a dual processor Linux/gcc system gives me less speedup, maybe 1.5x or so. But the disaster occurs on Mac OS/gcc. I have both a dual core and a quad core machine to test on. And on both systems, I see maybe only a 10% speedup over serial if I'm lucky, it's often less than that.
I assume that I've got to be doing something wrong. I can't imagine that the Linux or espcially the Mac system is that bad at multithreading that the overhead is taking away all advantage. Any hints on what I should be doing to fix this?