A procedure in my program is multi-threaded.
Although the elapsed time of the procedure reduced, the total elapsed time is longer.
I tried to use numactl --physcpubind=... to run my program, the total elapsed time in multi-thread mode is really shorter.
When using amplifier for analysis, I found the number of LLC cache miss increased,
but this happens even if I do nothing in my multi-threaded procedure.
(I replace a dummy procedure for multi-threading)
My machine is equiped with two Xeon CPU E5640, and my OS is Red Hat Enterprise 5.5, linux kernel : 2.6.18-194.el5
My machine is isolated so that I am sure that there is no any other havy jobs run concurrently.
Any idea about the problem?
Thanks a lot.