# Large Matrix inversion using parallel computing in ifort

## Large Matrix inversion using parallel computing in ifort

I want to invert a large matrix using parallel computing. I am working with ifort (fortran 90 compiler)on a cluster with multiple nodes. There are 8 processors per node. The memory for each node is shared among its processors. I have a general ideahow the program should work, i.e., how to break the program into tasks and assign tasks to each processors.

Whatspecificinstructionsdo I need to implement this inside my fortran 90 code? What libraries do i need to include? I also want to learn how to compile this file to be able to run as a parallel process. Is there any tutorial for starters to parallel computing using ifort?

Many thanks for your help with this.

4 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - suvabrata_das

I want to invert a large matrix using parallel computing. I am working with ifort (fortran 90 compiler)on a cluster with multiple nodes. There are 8 processors per node. The memory for each node is shared among its processors. I have a general ideahow the program should work, i.e., how to break the program into tasks and assign tasks to each processors.

Whatspecificinstructionsdo I need to implement this inside my fortran 90 code? What libraries do i need to include? I also want to learn how to compile this file to be able to run as a parallel process. Is there any tutorial for starters to parallel computing using ifort?

Many thanks for your help with this.

Since all your nodes share memory, one way to do this would be to "solve" the inverse one column at a time. To do this, set up to solve N systems of equations with a specific set up. Solve Ax = b where A is the (NxN) matrix you are trying to find the inverse of and b is the zero vector except for a '1' in the jth component. The solution vector , x, will be the jth column of the inverse. Since the computation of each column of the inverse is independent of all others each can be done in parallel. Of course, this scheme requires that each thread/core have access to the A array, which is why the shared memory across the cluster is an important requirement.

You can also compute the inverse through LU decomposition. If A = LU, then A' = L'U'. (Here I'm using A' to be the inverse of A.) Computing the inverse of lower and upper triangular matrices can be done easily, though I don't think the algorithm I've seen can be done in parallel (but I've not tried). Again, access to the entire L or U matrix looks to be needed here, as well.

If you've got the matrix distributed, then you've got a whole other set of possibilities. There could be something to compute an inverse in parallel in the ScaLAPACK library. Look at other numeric library packages for such a function, too. These libraries should be easily incorporated into a Fortran application. LAPACK or ScaLAPACK could even be used for the solving of the systems of equations method outlined above.

--clay

Quoting - Clay Breshears (Intel)

You can also compute the inverse through LU decomposition. If A = LU, then A' = L'U'. (Here I'm using A' to be the inverse of A.) Computing the inverse of lower and upper triangular matrices can be done easily, though I don't think the algorithm I've seen can be done in parallel (but I've not tried).

Ok, now I've tried, at least the serial algorithm. Here's the computation that yields the inverse of L into the lower-triangular B:

DO I = 1, N
DO J = 1, N
B(I,J) = 0.0
ENDDO
B(I,I) = 1.0 / L(I,I)
ENDDO

DO J = 1, N-1
DO I = J+1, N
ITMP = 0.0
DO M = J, I-1
ITMP = ITMP + L(I,M)*B(M,J)
ENDDO
B(I,J) = (-1.0/L(I,I)) * ITMP
ENDDO
ENDDO

Computation of the inverse for U is a similar algorithm. The above algorithm comes from Numerical Computation in Science and Engineeringby C. Pozrikidis (Oxford University Press, 1998); I translated to Fortran.

--clay

Matrix inversion is not an easy algorithm to implement in parallel and therefore I'd suggest you start by looking at libraries that already do this. For inverting a matrix on a single one of your nodes you could try using the Intel MKL implementations of the standard LAPACK libraries that contain routines that may be used to invert a matrix. The MKL ismultithreaded andshould give good parallel performance.If you really need to use multiple nodes to invert your matrix, then ScaLAPACK is a good place to start since it is built on MPI and may be used on your cluster. To invert a matrix with (Sca)LAPACK requires calling a few routines, but these are easily callable from FORTRAN.

Presumably you are using a cluster because you care about performance and/or memory capacity. There are a lot of different strategies for working with large matrices to improve performance and memory usage. If the matrix has a certain form (sparse, tri-diagonal, upper triangular, etc) then there may be specific libraries that will help you invert the matrix using less memory and with much higher performance.

The data dependencies in matrix inversion algorithms may make it difficult to get good performance scaling in parallel. If memory capacity isn't an issue for you and you need to invert multiple matrices, you might want to write your program to parallelize the inversion of multiple matrices at a time. This will scale in performance much better up to as many nodes as you have, as long as you want to invert more matrices than you have nodes.