I am a computer science student and I am intested to see the
performance speed up and scalability of Intel MKL for our algorithm. We
are using Intels woodcrest processor with corresponding bensley
platform, thus having 4 cores available.
The algorithm we are looking at is CG , it needs to solve Ax = b , for
500 b's . An obvious way to parallelize over multiple cores is letting
each core solve one Ax=b. I've seen that a CG framework is provided by
MKL and that we just need to fill in the operations.
Probably the most important operation in CG is the sparse matrix
vector multiplication. I've read in the Intel MKL documentation that
sparse blas lvl2 also uses OpenMP for threading and I am thus wondering
how this is implemented. Does this function spread the matrix over
different cores and then mtultiply it rowwise? I know it is possible to
thread in the way I want with OpenMP but I am interested to know how
Intel did this.