I have a Windows application which can spawn different number of threads which put a lot of load on CPU (vector and matrix operation). I have 2 boxes where I tested my app, one is xeon quadcore and another is 2 dual-core xeons.
For all of these boxes:
There is some (like 30-40%) speedup for 2 threads over 1 thread and CPU utilization is pretty high (above 90%) on those CPU where app is executed. When spawning 4 threads the application runs about 5% slower than 2 threads and CPU utilization jumps from high 60% to 99%
Why is that happens?
I tried to use trial version of Thread Profiler but it crashes when displaying results after application is finished with a message like "unknown error or not enough memory". I have 4 GB RAM so it is more likely unknown error.
Thanks in advance.
4 threads are slower than 2 on quadcore