i hope you are doing good. I am using intel MKL to develop a multi-CPU version of my linear system solver. The setup is as follows:
I have say 8 nodes connected via infiniband. Each node is fitted with a dual quad core xeon. I divide my computation (spmv's, ddots, daxpys) in equal chunks to all these nodes. Now the algorithm (Preconditioned CG) runs on all the nodes and the nodes have to communicate often in betweent he iteration loop to update their information and collaborate to arrive at a solution.
My question is that i use intel MKL to perform all the computations on each of these nodes. How can i make sure that each 'node (with 8 cores on each node) make use of all of its cores when running say spmv or ddot or daxpy or even dnorm?