I have an multi-threaded application in which runs 20% slower on my MacBook Pro with two threads than one. I checked for blocking conditions and found that this is not the problem. The application is huge and accesses a huge in memory database so the cache doesn't have that much effect on performance. So I figure the problem is that this machine does not have enough memory bandwidth to support two threads that access a lot of memory.
An example would be where each thread accesses a 2 GB array with reads and writes to random locations. It finishes after 32 gig writes to the array so logically 2 threads would finish in half the time as one if there was sufficient bandwidth for the required memory accesses. Are there Intell based machines where this is true? How about with more threads and cores?