Static vs. dynamic vector input

Static vs. dynamic vector input

imagem de vincent.ferri

Hi,

I am having an issue with dynamic input arguements to the following functions;

LAPACKE_dgelss();

dgetrf();dgetri();

ippmInvert_m_64f();

When the input vector is static A[10 * 10] = {...}; the output is correct namely A inverse.  If the same data values are read into a dynamic vector the output is incorrect. Why am I getting this anomally?

Thanks

Vince

24 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Zhang Z (Intel)

With the very limited information you provided, it's hard to reproduce the problem or guess an explanation. Would you provide a small reproducer? Or, at least show a code snippet calling these functions using static and dynamic arrays? Thanks.

imagem de vincent.ferri

Hi,

I will concentrate on one of the functions that I listed, it is used for LU factorization;

 lapack_int info;
 MKL_INT* ipiv;
 double dVandSize = 10;

 double* c = ( double * ) malloc ( dVandSize * dVandSize * sizeof ( double ) );

//If input vector c[10 * 10] is a static array with initialized values the function works, if c is dynamic and contains the same values it doesn't work; I included a file called c_vector that contains c;

 ipiv = ( MKL_INT * ) malloc ( dVandSize * sizeof ( MKL_INT ) );
 dgetrf(&dVandSize,&dVandSize,c,&dVandSize,ipiv,&info);  //Computes the LU factorization

 double* workspace = new double [dVandSize* sizeof(double)];

  dgetri(&dVandSize, c, &dVandSize, ipiv, workspace, &dVandSize, &info);

Thanks

Anexos: 

AnexoTamanho
Download c-vector.txt1.31 KB
imagem de Zhang Z (Intel)

Well, I was not able to reproduce the problem. Both static and dynamic arrays worked fine and gave identical results. See my test code attached.

But a careful look at your code snippet revealed this problem:

double dVandSize = 10;

Why was this variable declared as double when it should be an integer? Didn't you get compiler warnings?

Anexos: 

AnexoTamanho
Download c-vector.c2.19 KB
imagem de Sergey Kostrov

>>... I was not able to reproduce the problem...

There are differences in initializations and take a look:

[ This is how Vincent initializes ]
...
double *c = ( double * )malloc( dVandSize * dVandSize * sizeof ( double ) );
...
ipiv = ( MKL_INT * )malloc( dVandSize * sizeof ( MKL_INT ) );
...
double *workspace = new double [ dVandSize * sizeof( double ) ]; // Note: C++ operator new is used
...

[ This is how Zhang initializes ]
...
double *c = ( double * )malloc( dVandSize * dVandSize * sizeof( double ) );
...
ipiv = ( MKL_INT * )malloc( dVandSize * sizeof( MKL_INT ) );
...
double *workspace = ( double * )malloc( dVandSize * sizeof( double ) ); // Note: CRT-function malloc is used
...

Vincent, my question is Why do you need sizeof( double ) in new double [ dVandSize * sizeof( double ) ]?

imagem de Sergey Kostrov

Results are absolutely identical and please take a look:

[ Output when CRT-function 'malloc' is used ]

Intel(R) Math Kernel Library Version 10.3.12 Product Build 20120831 for 32-bit applications
Major version : 10
Minor version : 3
Update version : 12
Product status : Product
Build : 20120831

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

[ Output when C++ operator 'new' is used ]

Intel(R) Math Kernel Library Version 10.3.12 Product Build 20120831 for 32-bit applications
Major version : 10
Minor version : 3
Update version : 12
Product status : Product
Build : 20120831

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

imagem de Sergey Kostrov

// Sub-Test 1 - Gets MKL version
{
///*
MKLVersion Ver = { 0x0 };
int iLenData = 256;
char szVerData[256] = { 0x0 };

MKL_Get_Version_String( szVerData, iLenData );
CrtPrintfA( "\n%s\n", szVerData );

MKL_Get_Version( &Ver );
printf( "Major version : %d\n", Ver.MajorVersion );
printf( "Minor version : %d\n", Ver.MinorVersion );
printf( "Update version : %d\n", Ver.UpdateVersion );
printf( "Product status : %s\n", Ver.ProductStatus );
printf( "Build : %s\n", Ver.Build );

printf( "\n" );
//*/
}

// Sub-Test 2 - Test for dgetrf and dgetri functions
{
///*
double data[] =
{
+4.00e+000, +1.50e+001 , +4.00e+001 , +8.50e+001 , +1.56e+002 , +2.59e+002, +4.00e+002 , +5.85e+002 , +8.20e+002, +1.11e+003,
+1.50e+001, +8.50e+001, +2.59e+002 , +5.85e+002 , +1.11e+003 , +1.89e+003 , +2.96e+003 , +4.37e+003 , +6.18e+003 , +8.42e+003,
+4.00e+001, +2.59e+002 , +8.20e+002 , +1.89e+003 , +3.62e+003 , +6.18e+003 , +9.72e+003 , +1.44e+004 , +2.04e+004 , +2.79e+004,
+8.50e+001, +5.85e+002 , +1.89e+003 , +4.37e+003 , +8.42e+003 , +1.44e+004 , +2.28e+004 , +3.38e+004 , +4.80e+004 , +6.56e+004,
+1.56e+002 , +1.11e+003 , +3.62e+003 , +8.42e+003 , +1.63e+004 , +2.79e+004 , +4.41e+004, +6.56e+004 , +9.32e+004, +1.28e+005,
+2.59e+002, +1.89e+003, +6.18e+003 , +1.44e+004 ,+2.79e+004 ,+4.80e+004 , +7.59e+004, +1.13e+005 , +1.60e+005 , +2.20e+005,
+4.00e+002 , +2.96e+003 , +9.72e+003 , +2.28e+004 , +4.41e+004 , +7.59e+004 , +1.20e+005 , +1.79e+005 , +2.54e+005 , +3.48e+005,
+5.85e+002 , +4.37e+003 , +1.44e+004 , +3.38e+004 , +6.56e+004 , +1.13e+005 , +1.79e+005 , +2.66e+005 , +3.79e+005 , +5.18e+005,
+8.20e+002 , +6.18e+003 , +2.04e+004 , +4.80e+004 , +9.32e+004 , +1.60e+005 , +2.54e+005 , +3.79e+005 , +5.38e+005 , +7.37e+005,
+1.11e+003, +8.42e+003 , +2.79e+004 ,+6.56e+004 ,+1.28e+005 ,+2.20e+005 , +3.48e+005 , +5.18e+005 , +7.37e+005 , +1.01e+006
};

lapack_int info = 0;
MKL_INT *ipiv = NULL;
MKL_INT dVandSize = 10;
MKL_INT i;

// double *c = data;
// double *c = ( double * )malloc( dVandSize * dVandSize * sizeof( double ) );
double *c = ( double * )new double[ dVandSize * dVandSize ];
for( i = 0; i < dVandSize * dVandSize; i++ )
{
c[i] = data[i];
}

// ipiv = ( MKL_INT * )malloc( dVandSize * sizeof( MKL_INT ) );
ipiv = ( MKL_INT * )new MKL_INT[ dVandSize ];

dgetrf( &dVandSize, &dVandSize, c, &dVandSize, ipiv, &info );

if( info != 0 )
{
printf( "DGETRF INFO: %d\n", info );
exit( 1 );
}

// double *workspace = ( double * )malloc( dVandSize * sizeof( double ) );
double *workspace = ( double * )new double[ dVandSize ];

dgetri( &dVandSize, c, &dVandSize, ipiv, workspace, &dVandSize, &info );
if( info != 0 )
{
printf( "DGETRF INFO: %d\n", info );
exit( 1 );
}

for( i = 0; i < dVandSize * dVandSize; i++ )
{
printf( "% lf ", c[i] );
if( ( (i+1) % 10 ) == 0 )
printf( "\n" );
}

// if( workspace != NULL )
// free( workspace );
// if( ipiv != NULL )
// free( ipiv );
// if( c != NULL )
// free( c );

if( workspace != NULL )
delete workspace;
if( ipiv != NULL )
delete ipiv;
if( c != NULL )
delete c;

printf( "\n" );
//*/
}

imagem de vincent.ferri

But all you did was take a static vector and copy it to a dynamic vector, this works for me too.  But how about using

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, dVandSize,ldB, iStrideB, beta , b, iStrideB, a, ldA, alpha, c, ldC); as indicated in my last post, take this c vector and put it into dgetrf() dgetri( ).

Regards,

Vince

Anexos: 

AnexoTamanho
Download c-vector.txt2.28 KB
imagem de Zhang Z (Intel)

Vince,

What do you mean by "c vector"? How is it different than a staitc vector and a dynamic vector? And what does cblas_dgemm have to do with this? Instead of having all of us guessing what you want, it would be much easier to post your whole test code here, please?

By the way, have you got a chance to look at the issue pointed out by other replies on this post? Why is 'dVandSize' a double floating point variable? If you follow DGETRF and DGETRI signatures, this argument should be an integer. Have you tried to make it an integer? Does this solve the problem?

imagem de vincent.ferri

Hi

the c vector is the dyamic vector that you created, and cblas_dgemm () uses the a vector and b vector to produce the c vector and that is what you use for  LU. I have given the 'a' and 'b' vectors in the file c_vector.txt

Regards,

Vince

imagem de Zhang Z (Intel)

I believe you certainly have taken care of this, and it's probably not related to your original question. But just in case ..., the matrix order in cblas_dgemm can be either row major or column major, but dgetrf and dgetri assume column major matrix order as they are FORTRAN routines.

I'll take another look at it and let you know.

imagem de vincent.ferri

int iVandSize = 10;

double  alpha = 1.0;
 double  beta = 1.0;
 int ldA = iVandSize;
 int ldB = iVandSize;
 int ldC = iVandSize;
 int iStrideB = 4;

 

//cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, iVandSize,ldB, iStrideB, beta , b, iStrideB, a, ldA, alpha, c, ldC); 

Thansk

imagem de Sergey Kostrov

>>...Instead of having all of us guessing what you want, it would be much easier to post your whole test code here, please?..

Vincent,

We're trying to help you and please provide as more as possible technical details, like complete codes ( not snippets ), MKL version / update, platform ( OS ), C/C++ compiler, command line options, IDE, etc. OK?

Since I've already created my own test case I'll do another verification with the latest version of MKL ( 11 ) on a 64-bit Windows platform.

imagem de Sergey Kostrov

Application - IccTestApp - WIN32_ICC - Debug
Tests: Start
> Test1153 Start <

Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130123 for 32-bit applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130123
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

> Test1153 End <
Tests: Completed

//

Application - IccTestApp - WIN32_ICC - Release
Tests: Start
> Test1153 Start <

Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130123 for 32-bit applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130123
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

> Test1153 End <
Tests: Completed

//

Application - IccTestApp - WIN32_ICC - Debug
Tests: Start
> Test1153 Start <

Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130124
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

> Test1153 End <
Tests: Completed

//

Application - IccTestApp - WIN32_ICC - Release
Tests: Start
> Test1153 Start <

Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130124
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor

2.962044 -1.945165 0.176090 0.226115 -0.074739 0.002508 0.048719 -0.008698 -0.023452 0.007123
-1.945165 1.673040 -0.089525 -0.313017 0.066992 0.008686 -0.063396 0.020223 0.028665 -0.008833
0.176090 -0.089525 -0.177685 0.130976 -0.004348 -0.005037 0.024478 -0.011186 -0.010619 0.003654
0.226115 -0.313017 0.130976 -0.007564 -0.002099 -0.000986 0.002504 0.003507 -0.000518 -0.002568
-0.074739 0.066992 -0.004348 -0.002099 0.001317 -0.002620 -0.003569 -0.002405 0.001578 0.001495
0.002508 0.008686 -0.005037 -0.000986 -0.002620 0.000504 0.002958 -0.000118 -0.001566 0.000534
0.048719 -0.063396 0.024478 0.002504 -0.003569 0.002958 -0.003076 0.001410 0.000353 -0.000477
-0.008698 0.020223 -0.011186 0.003507 -0.002405 -0.000118 0.001410 -0.001292 0.000163 0.000311
-0.023452 0.028665 -0.010619 -0.000518 0.001578 -0.001566 0.000353 0.000163 0.000035 0.000025
0.007123 -0.008833 0.003654 -0.002568 0.001495 0.000534 -0.000477 0.000311 0.000025 -0.000186

> Test1153 End <
Tests: Completed

imagem de Zhang Z (Intel)

Vince,

The matrix produced by cblas_dgemm is very different than the original static matrix you provided. After cblas_dgemm, if you compare the result against the orignal static matrix, the root mean square error is more than 1.5e+02. Therefore, the inputs to the dgetrf call and the subsequent dgetri call are different, and different results are expected.

imagem de vincent.ferri

Hi,

When I use dgemm with vector 'a' and 'b' provided from the file you do not get 'c' with the same data as provided, that makes no sense becasue 'c' is a copy and paste from that function, the order of arguements are;

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, iVandSize,ldB, iStrideB, beta , b, iStrideB, a, ldA, alpha, c, ldC); 

Regards

Anexos: 

AnexoTamanho
Download c-vector.txt2.28 KB
imagem de Zhang Z (Intel)

Quote:

vincent.ferri wrote:

Hi,

When I use dgemm with vector 'a' and 'b' provided from the file you do not get 'c' with the same data as provided,

This is exactly what I was talking about. Multiplying 'a' and 'b' do not produce the same 'c'. It's not cblas_dgemm problem. I think the call to cblas_dgemm is correct. The order of arguments is correct. The problem is 'a' and 'b'. You need to check why your 'a' and 'b' do not produce the 'c' you expect.

imagem de vincent.ferri

Hi,

Do you get a 10 X 10 matrix or 4 X 4 it should be 10 X 10 since the product is b [10X4] * a[4 X 10].

Regards,

imagem de Zhang Z (Intel)

I copy/paste exactly the cblas_dgemm call you gave in your post. The result is a 10x10 matrix. But it is different than your reference matrix (the one you gave in your earlier post).

imagem de vincent.ferri

Hi,

I apoligize for going back and forth, but with the given c_vector file that I attached with vector 'a' and 'b' if you multiply them you do not get vetor 'c' the one in the file.  In Matlab b * a = c and cblas_dgemm also gives me the same 'c'. Here is my snippet;

 double  alpha = 1.0;
 double  beta = 1.0;
 int iVandSize = 10;

int ldA = iVandSize;
 int ldB = iVandSize;
 int ldC = iVandSize;
 int iStrideB = 4;

 //C = aphla*A*B + beta*C
 cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, iVandSize,ldB, iStrideB, beta , b, iStrideB, a, ldA, alpha, c, ldC); 

Regards,

P.S is it possible to show me your out put from this call.

Anexos: 

AnexoTamanho
Download c-vector.txt2.28 KB
imagem de mecej4

Quote:

I apoligize for going back and forth, but with the given c_vector file that I attached with vector 'a' and 'b' if you multiply them you do not get vetor 'c' the one in the file.

I have read this thread with increasing dismay. The use of misleading terms such as "c vector", illogical statements such as this quotation (in mathematics, the product of two vectors is either a scalar -- inner product-- or a matrix -- outer product) makes for much confusion.

Add to that apparent changes in topic from one post to another within the same thread, and we have a thread that should be quarantined, and a new thread opened with some attention to clarity and precision in problem statement.

imagem de vincent.ferri

Hi,

c_vector is the file name, what is wrong with you, it contains matrices which intel calls vectors ( everything in MKL seems to use vectors). The file contains three matrices a,b,c.

a is [4 x 10] and b is [10X4] and c should be [10 X 10] not a scalar.

What I now is Matlab computes the c matrix and inv(c) computs the correct coeff.

Regards,

imagem de Sergey Kostrov

>>...c_vector is the file name, what is wrong with you,..

Vicent, please watch your language.

imagem de mecej4

Quote:

vincent.ferri wrote:

a is [4 x 10] and b is [10X4] and c should be [10 X 10] not a scalar.

What I now is Matlab computes the c matrix and inv(c) computs the correct coeff.

Presumably, you want c to match b X a, rather than a X b, which would be a 4 X 4 matrix. Perhaps, you obtained the values of the elements of matrix c using Matlab or some such tool, but printed out the values with insufficient precision. Since a and b contain three-digit numbers, you need at least six digits for the elements of c. In the file, however, you only display three significant digits for  these. Therefore, you should expect errors of as much as 500 in the elements of matrix c in the file.

The solution is quite simple. Set Matlab (or other utility) to print enough significant digits before you compare.

Faça login para deixar um comentário.