The most time consuming part of my application is toinvert a fully-populated complex matrix. I used the PARDISO solver from MKL, and the process works fine, except when the matrix getting larger.
In a recent runthe matrix to be inverted is of the size 60200 by 60200. The machine running the code is an HP DL980 server with 64 cores at 2.4GHz, 1TB memory.
The solutionprocess appears to be in three stages:first the codeuses only one core for about 90-150 hours (for different frequencies) using about 170GB of memory, then the parallel processing part kicking in, uses up to 32 cores for about 8 - 10 hours with up to 320 GB of memory, finally the run uses one core again for about 10-15 hours, with 120 - 170 GB of memory (All times are calendar time. CPU time is about 400 hours in total).
The most frustrating period during the run is obviously the first stage. 90 - 150 hours for a single processor are about 4 - 6 days while all other processing power of 63 cores are wasted. I wonder if there is any way to speed up the period and utilize the power of other cores? Even just a factor of two (may be factorize odd and even rows concurrently?) would be greatly improve the performance. Is there any pre-processor we can do to the matrix to get it run faster?
I'd appreciate any input and ideas on how to improve the code.