I am using Intel MKL routine zgemm() to multiply two complex matrices
on a 2-core processor machine with a clock speed of 2.79 GHz
When I run the program with no OMP_NUM_THREADS and KMP_AFFINITY not
set, I am getting approximately 2700 MLFOPS. When I set
OMP_NUM_THREADS=2 and set KMP_AFFINITY= (null), my program's FLOPS go
down to 1390 MFLOPS. When unset KMP_AFFINITY FLOP rate goes down even
further to 1000 MFLOPS.
Why is the single thread code running better than when I specify two threads?