We have been using Scalapack for a while on our Linux cluster of Pentium 4 PCs. We use pdgesvd in order to compute the generalized inverse of large matrices. The function is called from a C routine.
Up to matrix sizes of about 15 000 x 15 000, everything seems to work fine (i.e. the inverted matrix makes sense). For matrices 20 000x20 000 and up, we get the wrong answer. To diagnose the problem, we tried:
- Compute U^T # U -> Gives the identity matrix. Ok.
- Compute V^T # V -> Gives the identity matrix. Ok.
- Some singular values are negative (!). Not OK.
- U # sigma #V^T doesn't give (obviously) the original matrix, since some singular values are negative.
The content of the matrix doesn't seem to matter, as meaningful (for us) and random matrices seem to give the same (wrong) result.
I would like to emphasize that "small" matrices are fine. So it excludes the obvious kind of bugs (but doesn't exclude with 100% certainty the possibility of memory corruption or something).
I have tested the following:
- Scalapack 1.7.4 and 1.7.0
- gcc and intel compilers
- I have tested the intel scalapack and netlib scalapack libraries and both seem to be problematic.
- I have tried to take the absolute value of the eigenvalues (the negative values are in the same range in absolute value as the positive ones) but that doesn't help.
- Different block sizes and numbers of CPUs (64 is the standard value I used)
- Different matrix shapes: both 20 000 square and 20 000 x 40 000 fail.
So my question is: is this behaviour "normal" ? Is it a bug in pdgesvd ? The propagation of numerical errors for large matrices in the particular algorithm used in pdgesvd ? Some other weird "feature" ?
Is there a way out of this ?
Any suggestions will be appreciated !
PS: Just in case, I have a little test code which demonstrates the effect. But in principle, just do the SVD of a 20kx20k random matrix and explore the singular values.
PPS: Using Linux 2.6.12-1.1381_FC3smp (fedora core 3)