ifc 6.0 -static -openmp problem

ifc 6.0 -static -openmp problem

Some experiments I've performed says
that "incompatible" -openmp -static options
for ifc 6.0 under RH 7.2 are mainly ifc problems
(because w/help of ifc 5.0 I can generate
working executable).

I've prepared some executable modules for
standard linpack (n=1000) test. Source calls
some lapack routines which I use from Atlas 3.4.1
libraries or MKL libraries.
I work with academic versions of ifc 5.0 and 6.0.
Options used are -O3 -tpp6 -openmp

There is in particular 2 computers I use:
1)With RH 6.2, ifc 5.0 and MKL
2) With w/RH 7.2, ifc 6.0
and tuned ATLAS 3.4.1 generated for ifc 6.0.

If I understood Release Notes for ifc 6.0
correctly, it's impossible to use -openmp -static
combination of options (at least under RH 7.2,
where I tried to do this); the final executable
leads to segmentation fault.

But under ifc 5.0 w/RH 6.2 all works fine if
I use MKL libraries; by default it creates statically
linked executable module. The corresponding
executable runs on my RH 7.2 w/o problems.

OK, now I build static executable (using ld -static)
in RH 7.2: from partially linked executable module
created after translation and linking w/ifc-5.0
under RH 6.2 and
ATLAS-3.4.1 tuned libraries created by me w/ifc 6.0
under RH 7.2. The final module works OK.

Resume: it looks that it's ifc 6.0 problem, and
not RH 7.2 Linux error, because ifc 5.0 and ld
can produce working statically linked executable module.

Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow kus@free.net

2 帖子 / 0 全新
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项

I think you may be seeing the effect of different defaults in the 5.0 and 6.0 compilers. In the 6.0 compiler, -openmp implies -auto, i.e. local arrays are automatic and allocated on the stack by default. This allows private copies to be created on the stack for each thread. (It may not be known at compile time whether the routine will be called from within a parallel region). The default array storage in 6.0 without -openmp is static. In 5.0, the default is static with or without -openmp; it is the user's responsibility in 5.0 to specify -auto along with -openmp if private array copies might be needed. Since this can lead to race conditions and runtime errors if -auto is not specified when needed, it was decided to make -auto the default for -openmp in 6.0.
However, allocating arrays on the stack requires a sufficiently large stack allocation for each thread. For large arrays, the stack allocation must be increased using ulimit -s (or limit stacksize in the C shell) as described in the 6.0 compiler release notes. Whilst this works for the master thread, the pthreads library in many Linux distributions has a hard-coded limit of a few MB for the daughter thread stack allocation, which cannot be changed using the commands above. Only pthreads libraries built with the FLOATING_STACKS option allow the daughter thread stack allocation to be changed by limit and ulimit. In the particular case of Red Hat 7.2 for IA-32, the shared pthreads library appears to be built with FLOATING_STACKS, but the static library does not. This can lead to seg faults when running OpenMP applications with large arrays built using -static. The same application would not get a seg fault with 5.0, because the arrays are not allocated on the stack by default. And provided that threadprivate array copies are not actually needed, or if no race conditions etc develop, the app will run fine.

You can test this hypothesis by building with -auto -openmp using 5.0; by building with -autoscalar or -save with 6.0; and by building with 6.0 using -i_dynamic instead of -static, to link with the shared pthreads library instead of the static one.

Linking to a pthreads library built with FLOATING_STACKS, eg the shared version in Red Hat 7.2, is undoubtedly the right thing to do going forward. If you work without -auto, as in the 5.0 default, you leave yourself open to possible runtime threading conflicts at a later date.

Martyn Corden
Software products Division
Intel Corporation

发表评论

登录添加评论。还不是成员?立即加入