I use windows 7 sp1, I7-2600k, hyper-threading and turbo-boost both disabled.
I compiled the application using vs 2010 with intel composer latest version.
I run a piece of AVX code on an image - 1280X960.
Using single threead bind to single core I get 10ms run-time.
Splitting the image into 2 equal halves and runing each half on a seperate thread and core (I bind each thread to a different core) I get 8.8ms run time.
Moving to 3 threads of three thirds of the image - I get 8ms runtime.
I use the same technique using sse4 code and non sse code and the factors I get are usully ~(number of cores).
Is there something I was missing ?