I have built a 64 bit program(gcc) with C using openmp. The program calculates matrix related results. The program runs well but I find that as threads number > 16 that the time taken to process the job increases. The simple results below showsMatrix sizeprocessed in the vertical and number of threads allocated in the horizontal, the processing times are the data values.Note the increase in processing for 32 threads when compared to 16 threads. I imagine that this is a result of how the memory is configured wrt the processors. ?Similar SMP machines with 32 cores(SUN) do not show this increase in processing time from 16 to 32 threads.
2 4 8 16 32
2000x2000 140 73 40 58 62
4000x4000 502 255 133109 135
6000x60001066 568 292 241 297
I am using qsub to submit my batch job and was wondering whether I need to balance the threads per node? i.e. alllocate threads per node or something like that.
Anyone care to have a guess as to why the increase in processing time with more nodes is happening ?