-check uninit,pointer changes runtime behaviour

-check uninit,pointer changes runtime behaviour

Hi,

I inherited a 40k line fortran code, and when I enable OMP it changes its behaviour. And now I am kind of lost what to do. I enabled -check all, which causes my code to crash. Enabling different checks, I get the following for me perculiar behaviour,

#### no change:
# F90FLAGS+= -check arg_temp_created,bounds,format
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion
#### problematic (change results to the "better"?):
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,uninit
#### problematic (change results to the worse):
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,pointer
#### crash!:
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,uninit,pointer

so, only adding uninit and pointer changes something. Putting both together creates nan's.

I assume that my variables are beeing overwritten somewhere, however, the check bounds does not come up with anything - hence, are there some ways that check bounds does not pick up an error?

All the subroutines, etc are put in modules.

regards,
Mike

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - mike-s
Hi,

I inherited a 40k line fortran code, and when I enable OMP it changes its behaviour. And now I am kind of lost what to do. I enabled -check all, which causes my code to crash. Enabling different checks, I get the following for me perculiar behaviour,

#### no change:
# F90FLAGS+= -check arg_temp_created,bounds,format
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion
#### problematic (change results to the "better"?):
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,uninit
#### problematic (change results to the worse):
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,pointer
#### crash!:
# F90FLAGS+= -check arg_temp_created,bounds,format,output_conversion,uninit,pointer

so, only adding uninit and pointer changes something. Putting both together creates nan's.

I assume that my variables are beeing overwritten somewhere, however, the check bounds does not come up with anything - hence, are there some ways that check bounds does not pick up an error?

All the subroutines, etc are put in modules.

regards,
Mike

addendum: when I do not link the svml, it also works like a charm. However, I understood that the svml lib is threadsafe - hence there are still some bugs in the code?

Using the -save switch makes the code crash because of -check bounds, the loop variables overflow.
i.e.
real*8 :: X(0:3)
do i=1,3
print*,X(i)
enddo
crashes with an out of bound exception, with i=4??
regards,
Mike
Mike

Quoting - mike-s

addendum: when I do not link the svml, it also works like a charm. However, I understood that the svml lib is threadsafe - hence there are still some bugs in the code?

Using the -save switch makes the code crash because of -check bounds, the loop variables overflow.
i.e.
real*8 :: X(0:3)
do i=1,3
print*,X(i)
enddo
crashes with an out of bound exception, with i=4??
regards,
Mike
Mike

Hi Mike,
I suggest that you have a few options to explore. Fortunately this is only a small code so it shouldn't
take long to figure out what is going on.

Below, is a brief shorthand description of how I tackle these
problems. I am sure that it is not the only way or even the best way. Debugging is definitely an art, and
everyone tackles it in their own way.

IMHO, I have not ever had much success using debuggers on OpenMP or MPI codes and hold the print* debugger in high esteem! For large amounts of data, I usually add some n ice string names to each print*, redirect the output to a file and then use grep to wade through the files. I haveused this successfully on quite large output files (several GB).

Different answers using OpenMP can be caused by not declaring, correctly, the private and shared
variables in the OpenMP parallel sections.

If you are using common blocks, be particularly careful
to check which ones need to be threadprivate and of these which ones must be copied in.
I think that the The Intel Fortran Documentation is particularly good at explaining all of this.

Program failureusing certain compiler options can be a good thing. It is usually much easier to then find the problem.

Use -g -traceback to get a stacktrace of where it is failing and use that as a starting point to use print statements to dump some values to see what is changing. Note that -g enables -O0 by deafult, and that -O2 is the default optimsation level.

The great thing about debugging OpenMP programs is that you can progressively remove parallel regions.
Provided there is no orphaning (OpenMP regions declared in one source file with calls to other routinescontaining OpenMP worksharing constructs), do a binary split of the source files, compiling half with -openmp and the other half without.

Run the program, and decide whether the problem has gone or not and then do another binary split.
Continue until you can find a single file that changes the results depending whether -openmp is used or not.

This assumes you have only one source file that is causing the problem. If so, then progressively double comment out the constructs around each OpenMP parallel region until only one region causes the problem.

If you suspect the smvl, then I suggest finding out which calls are be made from your program and try to mimic the behaviour in a small test program.

Happy hunting

regards
Mike

Quoting - Mike Rezny

Hi Mike,
I suggest that you have a few options to explore. Fortunately this is only a small code so it shouldn't
take long to figure out what is going on.

Below, is a brief shorthand description of how I tackle these
problems. I am sure that it is not the only way or even the best way. Debugging is definitely an art, and
everyone tackles it in their own way.

IMHO, I have not ever had much success using debuggers on OpenMP or MPI codes and hold the print* debugger in high esteem! For large amounts of data, I usually add some n ice string names to each print*, redirect the output to a file and then use grep to wade through the files. I haveused this successfully on quite large output files (several GB).

Different answers using OpenMP can be caused by not declaring, correctly, the private and shared
variables in the OpenMP parallel sections.

If you are using common blocks, be particularly careful
to check which ones need to be threadprivate and of these which ones must be copied in.
I think that the The Intel Fortran Documentation is particularly good at explaining all of this.

Program failureusing certain compiler options can be a good thing. It is usually much easier to then find the problem.

Use -g -traceback to get a stacktrace of where it is failing and use that as a starting point to use print statements to dump some values to see what is changing. Note that -g enables -O0 by deafult, and that -O2 is the default optimsation level.

The great thing about debugging OpenMP programs is that you can progressively remove parallel regions.
Provided there is no orphaning (OpenMP regions declared in one source file with calls to other routinescontaining OpenMP worksharing constructs), do a binary split of the source files, compiling half with -openmp and the other half without.

Run the program, and decide whether the problem has gone or not and then do another binary split.
Continue until you can find a single file that changes the results depending whether -openmp is used or not.

This assumes you have only one source file that is causing the problem. If so, then progressively double comment out the constructs around each OpenMP parallel region until only one region causes the problem.

If you suspect the smvl, then I suggest finding out which calls are be made from your program and try to mimic the behaviour in a small test program.

Happy hunting

regards
Mike

Adding to Mike's excellent advice:

in addition to adding -g -traceback on the compile and link, could we see your entire F90FLAGS? You only showed the additions. What is the complete set of options, both compiler and link, that you use.

And why did you manually add -lsvml? Did you inherit these options from some other developer? It could be that the options you inherited are out of date with the latest compiler.

ron

Leave a Comment

Please sign in to add a comment. Not a member? Join today