I wrote a realtime raytracer sometime last year, and at the core of it is obviously a sphere/ray test. This 1 function that does my tests (4 at a time) is running MUCH slower on the Core i7 (965) compared to my Core 2 (q6600). To narrow things down further, I've clocked both machines at 2700mhz, reduced the raytracer to 1 thread. HT makes no difference in this issue on the Core i7, so for these tests it was left on (I tried it both ways).
The output looks like this:
Here's the code and VTUNE.
It's worth noting that if I simply return NULL from my intersect function, the Core i7 runs 75fps, the Core2 runs 40fps...so the Corei7 really does do all the other code faster it seems...it's just down to this one problem area. My raytracers "shading" code seems to run about the same speed on both machines with the i7 winning by a bit. Also worth noting that every benchmark I've run on the 2 machines shows the i7 killing the Core 2.....so its not a problem with my PC.
Also, if I expand the source out into ASM view, the branch mispredictions are of course on the multiple "jbe" instructions within that loop.