Hi there, I'm a newbie in parallel computing, and met a project that requires using C/C++ to rewrite a fortran program and finally runs on a multi-core Intel CPU + GPU server node or even a cluster.

The program involves much matrix/multidimensional array multiplication. In fortran it easy to do row/column assignment, matrix addition, subtraction, multiplication etc., but not so simple for C - it needs to be enhanced to have functions like fortran or matlab.

So I read a bit about CUDA, OpenCL about GPU and OpenMP, MPI about parallel machines. There are only low level APIs, and don't have the functions above. If I make some libraries for C myself, I'm afraid they won't be reliable or efficient enough. So I wonder if there're any opensource libraries. I saw nvidia CUDA topics recommended a library called arrayfire, but is commercial. And I tried gsl, and also heard about mtl, octave, I'd like to ask which is the best option? or Intel mkl?

Thanks!