Problems with PDPOTRI for certain sizes

Problems with PDPOTRI for certain sizes

Dear all,

I have issues when using Intel MKL to provide Blas+Lapack+ScaLapack. Specifically, i use it with OpenMPI+GCC and link against libmkl_scalapack_lp64, libmkl_blacs_openmpi_lp64, libmkl_intel_lp64, libmkl_core, libmkl_sequential. The issues is with calling pdpotri after Cholesky factorization:

pdpotri_ (&uplo,&n_columns, A_loc, &submatrix_row, &submatrix_column, descriptor,&info);

where

void pdpotri_(const char *UPLO,
                const int *N,
                double *A, const int *IA, const int *JA, const int *DESCA,
                int *INFO);

 

I don't have any issues with my code on both Ubuntu and macOS when using Netlib-Scalapack 2.0.2 + Openblas 0.2.20. For those cases, calculation of Cholesky factorization followed by pdpotri produce correct results and agree with serial Lapack. Also note that other small test programs I have (i.e. calculate L1 norm or do Cholesky factorization) run ok with Intel-MKL. This suggest that something is wrong in Intel-MKL implementation of pdpotri.

 

 

A bit more on the issue itself: this happens when run with 4 MPI cores (2x2 grid) and a small test program; the program runs ok with matrix 64x64 with 32 blocks but fails for 120x120 with 32 blocks. Specifically, i see a floating point exception from process rank 2. Not sure I can debug it further on my side.

 

p.s. That's Intel-MKL 2017.3.196.

 

Regards, Denis

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

David, could you give the example to check the case on our side? or may you try if the problem still exists with the latest MKL 2018 which we released yesterday?

Gennady, the code is attached. Run with "mpirun -np 4 example 120 32". While running a stand-alone example, GCC complained that error is during MPI_Bcast. NetlibScalapack+Openblas run ok. This is with GCC 5.4.0+OpenMPI 2.1.1. By printing out debug info I see that this happens indeed in pdpotri.

Regards, Denis

#include <mpi.h>
#include <iostream>
#include <iomanip>
#include <string>
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <sstream>
#include <cmath>
#include <vector>
#include <functional>
using namespace std;

// compile with:
// mpic++ -std=c++11 example.cc -o example -lscalapack -lopenblas
// mpic++ -std=c++11 example.cc -o example -lmkl_scalapack_lp64 -lmkl_blacs_openmpi_lp64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential


extern "C" {
  /* Cblacs declarations */
  void Cblacs_pinfo(int *, int *);
  void Cblacs_get(int, int, int *);
  void Cblacs_gridinit(int *, const char *, int, int);
  void Cblacs_gridinfo(int, int *, int *, int *,int *);
  void Cblacs_pcoord(int, int, int *, int *);
  void Cblacs_gridexit(int);
  void Cblacs_barrier(int, const char *);
  void Cdgerv2d(int, int, int, double *, int, int, int);
  void Cdgesd2d(int, int, int, double *, int, int, int);

  int numroc_ (const int *n, const int *nb, const int *iproc, const int *isproc, const int *nprocs);

  void pdpotrf_(const char *UPLO,
                const int *N,
                double *A, const int *IA, const int *JA, const int *DESCA,
                int *INFO);

  void descinit_ (int *desc, const int *m, const int *n, const int *mb, const int *nb, const int *irsrc, const int *icsrc, const int *ictxt, const int *lld, int *info);

  int indxl2g_ (const int *indxloc, const int *nb, const int *iproc, const int *isrcproc, const int *nprocs);

  void pdpotri_(const char *UPLO,
                const int *N,
                double *A, const int *IA, const int *JA, const int *DESCA,
                int *INFO);

}

int main(int argc, char **argv)
{
  int mpirank,nprocs;
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpirank);
  MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
  const char uplo('L');
  const int first_process_row(0);
  const int first_process_column(0);

  const bool mpiroot = (mpirank == 0);

  /* Scalapack / Blacs Vars */
  int N, M, Nb, Mb;

  if (argc < 3)
    {
      if (mpiroot)
        cerr << "Provide N and Nb parameters" << std::endl;

      MPI_Finalize();
      return 1;
    }
  else
    {
      /* Read command line arguments */
      stringstream stream;
      stream << argv[1] << " " << argv[2];
      stream >> N >> Nb;
      M = N;
      Mb = Nb;
    }
  if (mpiroot)
    cout << "Running with N=" << N << " Nb=" << Nb << std::endl;

  int procrows = 0, proccols = 0;
  {
    const int n_processes_heuristic = int(std::ceil((1.*N)/Nb))*
                                      int(std::ceil((1.*M)/Mb));
    const int Np = std::min(n_processes_heuristic, nprocs);
    const double ratio = 1.0;
    const int Pc = std::sqrt(ratio * Np);
    proccols = std::min (Np, std::max(2, Pc));
    procrows = Np / proccols ;
  }

  /* Begin Cblas context */
  int ctxt = 0, myrow = 0, mycol = 0, numproc = nprocs;
  // Cblacs_pinfo(&myid, &numproc);
  Cblacs_get(0, 0, &ctxt);
  Cblacs_gridinit(&ctxt, "Row-major", procrows, proccols);
  Cblacs_gridinfo( ctxt, &procrows, &proccols, &myrow, &mycol );

  if (mpiroot)
    cout << "Process grid " << procrows << "x"<<proccols << "<=" << numproc << std::endl;

  if ((myrow > -1) && (mycol > -1))
    {
      if (mpiroot)
        cout << "setup..." << std::endl;

      const int nrows = numroc_(&N, &Nb, &myrow, &first_process_row, &procrows);
      const int ncols = numroc_(&M, &Mb, &mycol, &first_process_column, &proccols);

      const int lda = max(1,nrows);
      int info=0;
      int descA[9];
      descinit_(descA, &N, &M, &Nb, &Mb,&first_process_row,&first_process_column,&ctxt, &lda, &info);
      if (info != 0)
        cerr << "error in descinit_" << std::endl;

      std::vector<double> A(nrows*ncols);

      // set to SPD
      {
        int seed = 13;
        srand(seed);
        std::vector<double> global_A(N*M);
        auto gind = [&] (const int row, const int col)
        {
          return col*M+row;
        };

        for (unsigned int c = 0; c < N; ++c)
          for (unsigned int r = c; r < M; ++r)
            {
              const double val = ((double) rand())/ ((double)RAND_MAX);
              if (r==c)
                global_A[gind(r,c)] = val + N;
              else
                {
                  global_A[gind(r,c)] = val;
                  global_A[gind(c,r)] = val;
                }
            }

        int k = 0;
        for (int c=1; c <= ncols; ++c)
          {
            const int glob_c = indxl2g_ (&c, &Nb, &mycol, &first_process_column, &proccols) - 1;
            for (int r = 1; r <= nrows; ++r)
              {
                const int glob_r = indxl2g_ (&r, &Mb, &myrow, &first_process_row, &procrows) - 1;
                A[k] = global_A[gind(glob_r,glob_c)];
                k++;
              }
          }
      }

      const int submatrix_row = 1;
      const int submatrix_column = 1;

      double *A_loc = &A[0];
      info = 0;
      if (mpiroot)
        cout << "pdpotrf..." << std::endl;
      pdpotrf_(&uplo,&N,A_loc,&submatrix_row,&submatrix_column,descA,&info);
      if (info != 0)
        {
          cerr << "error in pdpotrf_ " << info << std::endl;
          return 1;
        }

      if (mpiroot)
        cout << "pdpotri..." << std::endl;
      pdpotri_ (&uplo,&N, A_loc, &submatrix_row, &submatrix_column, descA,&info);
      if (info != 0)
        {
          cerr << "error in pdpotri_" << info << std::endl;
          return 1;
        }

      if (mpiroot)
        std::cout << "done!" << std::endl;

      Cblacs_gridexit(ctxt);
    }
  MPI_Finalize();
}

 

Hello Denis,

Thank you for reporting the issue. Yes, you are right, MKL 2017.3 contains the issue and it occurs with OpenMPI only. The issue will be fixed in the next release. Currently, in order to avoid the issue you can use other MPI implementations (Intel MPI, MPICH, etc.).

Regards, Denis

Hi Denis, could you check the latest MKL 2018 update 1 where we fix this issue and let us know how this update will work on your side!

 

Hi Gennady, I can confirm that the issue is gone with 2018.1. Tested with Openmpi 3.0.0. Thanks for fixing! Feel free to close this thread and mark it as resolved, if there is such an option here on forums.

Regards, Denis.

thanks Denis for letting us know!

Leave a Comment

Please sign in to add a comment. Not a member? Join today