Allocation/Deallocation errors with OpenMP code

Allocation/Deallocation errors with OpenMP code

Dear all,

I'm currently working on parallelizing our quantum chemistry code (with ifc 7.1 on Redhat 9). For the most part it works very well.

Some questions have arisen though:

1) One routine cannot be compiled. The compiler stops with an "internal compiler error". This issue should probably be posted to Intel. Can I do this without premier support? (I have the free Linux version.)

2) The mentioned routine has lots of "private" variables assigned to each thread, so that the number of continuation lines exceeds the maximal number allowed by the Fortran standard (19). But it even exceeds the maximal number of continuation lines allowed by ifc (100+something)! I had to split the "private"-declaration into two parts, one after the OMP-"parallel" and one after the OMP-"do" directive. This solved the problem, but it's not very elegant.

3) After setting "limit stacksize" and KMP_STACKSIZE to reasonable values the program runs in parallel mode but stops after a while with an Allocation/Deallocation error. Two typical error messages are:

Allocate error 494: Allocation of 37601792 bytes failed
** Address Error **
Command terminated by signal 11

Deallocate error 493: Variable was not created by ALLOCATE
** Address Error **
Command terminated by signal 11

This always happens in the same routine which, however, was called and excecuted successfully a number of times before the program stops.
Could this be another compiler bug or is it more likely an error in the code?

I'm still an ifc and OpenMP newbie.

Thanks in advance for comments and help.

Best regards.

publicaciones de 9 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

1) you can (and should) still sign up for Premier Support and submit issues, even if you have the non-commercial compiler. Of course, you may not get as timely a response as if you had a compiler with support, but we still want to get your feedback.

2) The maximum number of continuation lines will be increased in the next version of the compiler.
Others ways to cope with the limit might be to use free format source, or to make PRIVATE the default for your parallel section, and only specify those variables that are shared.

3) One can't tell from the information available. It's possible that you could have a simple memory leak.
Note that limit stacksize and KMP_STACKSIZE refer to the stack, whereas ALLOCATE normally allocates memory on the heap. You would normally put data that you want to be thread private on the stack.

Martyn

Hello Martyn,

thank you for your comments.

1) I found out that the internal compiler error only occurs with the combination "-O3 -openmp". "-O2 -openmp" works fine. Also, "-O3" alone works. ("-openmp" alone is the same as "-O2 -openmp".)

2) Good to know.

3) I found a bug in the routine. Now, it seems to work ok but the allocation error occurred after the hundredth or so call to the routine before. Let's hope for the best...

Best regards

The allocation error still occurs. Sometimes in other routines. It e.g. occurs in the following piece of code:

REAL*8 WORKP(3000000) (typically)
...
C$OMP PARALLEL DEFAULT(SHARED) PRIVATE(IJ,I,ADDFAC,J)
C$OMP DO REDUCTION(+:WORKP)
DO K=1,NLA
IJ=0
DO I=1,NCNTRT
ADDFAC=CALPHA(I,K)*DOCA(K)*2
DO J=1,I
IJ=IJ+1
WORKP(IJ)=WORKP(IJ)+ADDFAC*CALPHA(J,K)
ENDDO
ENDDO
ENDDO
C$OMP END DO
C$OMP END PARALLEL
J=0
DO I=1,NCNTRT
J=J+I
WORKP(J)=WORKP(J)/2
ENDDO

This is very easy. There should not be a flaw in it (I hope). The programme seems to crash in those routines which make use of large threadprivate arrays.

In one of the Intel-Compiler-FAQs on the web it says that reduction of arrays is "incomplete". Perhaps, this is the problem. Can anybody shed light on this? I did not find anything about that in the User's Guide.

Best regards
C.

I don't see any sum reduction in the code you present. Would it make more sense simply to declare the outer DO as a PARALLEL DO? I don't read your message as asking for better diagnostics from the openmp compiler, but it seems that declaring a REDUCTION where there isn't one is not working well.
Also, it's not clear whether you meant K to be SHARED.

Thanks for answering.

The code looked a bit confusing because after posting the message all leading spaces were cut. I'll try again:

C$OMP PARALLEL DEFAULT(SHARED) PRIVATE(IJ,I,ADDFAC,J)
C$OMP DO REDUCTION(+:WORKP)
...........DO K=1,NLA
.............IJ=0
...............DO I=1,NCNTRT
...............ADDFAC=CALPHA(I,K)*DOCA(K)*2
.................DO J=1,I
.................IJ=IJ+1
.................WORKP(IJ)=WORKP(IJ)+ADDFAC*CALPHA(J,K)
...............ENDDO
.............ENDDO
...........ENDDO
C$OMP END DO
C$OMP END PARALLEL

It should be clearer now. WORKP is updated by all threads in the line WORKP(IJ)=... .

PARALLEL DO, as I understand it, is just an abbreviation of PARALLEL and DO on two lines.

K is the DO-variable of the outer loop, i.e. of the one that is parallelized. It is by definition private and need not (but may) be declared explicitly.

The Intel-FAQ says sum reduction of arrays is incomplete. My question is what does that mean.

Best regards

Array reductions are in the OpenMP 2.0 standard, but not in 1.1. The Intel 7.1 compiler supports OpenMP 2.0 with the exception of WORKSHARE. It has full support for array reductions. The caveat you quote about array reductions referred to an earlier compiler, and is no longer applicable. The documentation will be updated, thanks for pointing that out.
The 7.1 compiler also contains a new option,
-stack_temps, that causes temporaries to be allocated on the stack instead of on the heap, which is more appropriate for OpenMP applications. You might like to try this option, if you haven't already, to see if it makes any difference.
If you still have a problem that doesn't seem to be due to your code, please submit an issue to Intel Premier Support.

Martyn

Thanks for the reply.

With "-stack_temps" there is no message about failed allocation or deallocation any more. The address error, however, still occurs in parallel regions making use of large private arrays.

Could it be that Redhat*9 is the source of the problem?

I posted the issue to Intel Premier Support. I understand that they are checking at the moment whether I am allowed to post at all.

Best wishes
Christoph

You can always report issues to Intel Premier Support. If you don't have a support license, then you use the "Limited Support" product. Your question will be read and acknowledged, and perhaps passed on as needed, but you may not get a solution or a fix.

Steve

Steve - Intel Developer Support

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya