I am running some multithreaded benchmark programs in Mic. Some programs don't scale beyond 32 threads and some beyond 64. I am trying to find out the reason why they are not scalable beyond certain number of threads. Definitely, the poor scaling is not a result of lack of computing resources (i,e we can run 244 hw threads without the problem of context switching).
I am trying to analyze this using Vtune but am still not sure how to study this issue.