Scalapack BUG in PZUNGQR (Again)

Scalapack BUG in PZUNGQR (Again)

Hi,

Previously, we reported a possible Scalapack bug in the PZUNGQR function

             http://software.intel.com/en-us/forums/topic/473803

That issue has still not been resolved, but it was stated that it was an issue with zero-sized matrices on some nodes.  However, we have encountered a somewhat similar issue with PZUNGQR even when the local matrices do no have zero-size.  In the attached test case, the PZGEMM call that follows the PZUNGQR call will either hang or produce Irecv error even though the QR matrices and the PZGEMM matrices have non-zero sized matrices on all nodes.  Interestingly, if the matrices used in the PZGEMM call have a global size less than the block size (only one node has non-zero sized matrices), then it completes fine. 

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

Thanks, John

AttachmentSize
Downloadimage/png output_0.png93.37 KB
Downloadapplication/octet-stream test_2.F9010 KB
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In the attached test case, the  bug only occurs if single-node matrices call ZUNQGR and multiple node matrices call PZUNGQR.  If all nodes call PZUNGQR it does not occur.  However, in our full code the bug seems to occur sometimes even if all nodes call PZUNGQR.  Unfortunately, I was not able to reduce this particular behavior down to a simple test case.

It just occurred to me that mixing PZUNGQR and ZUNGQR is not the issue.  The primary issue is that some nodes call PZUNGQR and some don't.  If the ZGEQRF/ZUNGQR call in the attached test case is commented out, the bug occurs since only the matrices that are really distributed call PZUNGQR and the single-node matrices do nothing for the QR.

John,

The first issue you reported earlier is still being investigated by the MKL team. Thank you very much for the additional information. We will make sure our fix covers this new scenario as well.

 

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

Quote:

Zhang,

We appreciate that you are working on this.  Do you have an approximate time estimate on this?  This bug in the mkl library has been holding up a deliverable to our customer.   I know you can't put a firm date on it. However, if you could estimate in days, weeks or months, it would be helpful.  Thank you!

 

Thanks for letting us know the impact of this issue on your deliveries. I'll send you a note with an estimate as soon as I have a solid idea on where we are now, hopefully in 1-2 business days.

Zhang,

We really appreciate that.  Thank you!

Leave a Comment

Please sign in to add a comment. Not a member? Join today