memcpy inside openmp parallel for not working

memcpy inside openmp parallel for not working

Hello, I'm having problems using memcpy inside an openmp for loop. There is an example below, if I change the "szl[0] = aux" to "memcpy(szl, &aux, sizeof(float))" the result is different. Basically this code shoud put "20.00" on the first position of all "lines" (last dimension). It works with the attribution, but not with the "memcpy" stuff. I'm using last version of icc, numY, numX and numZ are initialized properly.

This problem just happens when I use more than one thread, if I use just one thread, the both versions work properly.

float vz[numY * numX * numZ];

memset(vz, 0, numY*numX*numZ*sizeof(float));

#pragma omp parallel for
   for (k = 4; k < numY-4 ; k++)
     for (j = 4; j < numX-4 ; j++)
        float *szl;
        float aux;
        unsigned long offset;
        aux = 20.00;

        offset = k*numX*numZ + j*numZ;
        szl = vz + offset;
        szl[0] = aux;

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You are asking this question in the wrong forum. It's a general OpenMP question, whereas this forum is for discussions that are specifically about the implementation of the Open Source Intel OpenMP runtime.

I suggest you ask either on which has an active OpenMP group, or  Threading on Intel® Parallel Architectures if you believe your question is specific to Intel processors or compilers.

Ok, I will ask it on the correct place. Sorry for the inconvenience.

> Ok, I will ask it on the correct place. Sorry for the inconvenience. 

No problem!

Without looking in much detail, my guess is that you have a race condition, and that you;re seeing a difference because a simple, aligned store is atomic (either all four of the bytes that make the float are written, or none are), whereas a mecmcpy is not since it's allowed to be single byte writes. (But, this may be wrong, since I haven't thought about it much, so do ask elsewhere!)

Probably you have race condition on the variable j.  Try adding private(j) clause to the parallel construct.


Hello Andrey, as fas as I know, considering "j" is a loop control variable, the private clause is already included. I'm suspecting there is something before in the code corrupting memory, because this code alone in a "main" is working. Once I identify the problem I will post here.

Best regards, Rafael

Now it is working. I was doing some stuff using SIMD stuff before the code. There was a bug, the memory probably has been corrupted by a drunk pointer...

Thank you very much guys.

Leave a Comment

Please sign in to add a comment. Not a member? Join today