given a small matrix (100 / 100) and a large number of threads (80). the overhead of creating these threads is quite alot compared to a single thread solving the problem. any idea on how we can handle this?
is the intel team interested in viewing how does the problem scale with small data and large number of threads? or their only interest is large dataset with small/large number of threads?