I'd like to ask some questions about grainsize. First, I'm now trying to apply "parallel_for" inmy simulation software. But the problem is thatI could not give the fixed grainsize because the size of the iteration loop( i.e., parallel_for loop) is not a fixed one. It depends on the model (i.e., the antenna model used in the simulation)and so I've tried to use "auto_partitioner" but the result is not as I expected. For example, if I've used the fixed grainsize for one model (that is the best grainsize I think), I can get better result (I mean better speedup in simulation time)than using with "auto_partitioner".
I've tried toreadmany articles about grainsizebut still can't get how to choose the best grainsize for my application. As we can't predict the iteration steps in the loop, and also the number of cores in the system, I don't have any idea how to get the best grainsize for all models and for all operating systems. So, can anyone help my problem?
Thanks a lot!