I found parallel_for always (if possible) try to adjust the grainsize to a smaller value in order to split whole range into a number of 2^N chunks. Are there any necessities that does in such a way? As I know, say, 4-cores processor with Hyper-Threading, theoretically total 8 threads are available. But it does not mean that a program can get all of these resources, depend upon the operating system situation at the time. This brings no problem, just would like to know. For the simplicity of parallel_for, it has a primitive approach to generate threads of any numbers justprogramsinvoke. Many thanks.
For more complete information about compiler optimizations, see our Optimization Notice.