serial MKL with openmp

serial MKL with openmp

Hi,I have the following problem: I want to use MKL, mainly for the Pardiso direct solver, in a relatively large flow simulator.For a part of the code, unrelated to MKL/pardiso, I want to use openmp to parallelize certain CPU intensive loops. I can compile the code without MKL and see nearly optimal linear scaling of my parallelized loop when I compile with openmp. However, when I link to MKL, every other part of the code becomes significantly slower in terms of total CPU time. The wall clock time of a simulation may be slightly reduced, but it seems MKL uses some inefficient attempts at parallelization that result in much higher CPU cost. As a result, I want to disable all MKL parallelization, if only to test the scaling of the parallelization that I implement explicitely myself.This MKL behavior is strange, because I'm using the sequential MKL from the link advisor, and I set bothOMP_NUM_THREADS=1 andOMP_MAX_THREADS=1 in my .profile file, and use

!$OMP PARALLEL NUM_THREADS(4) only for the loop I want to parallelize.

How can I use the -openmp flag in a makefile for individual modules/subroutines, without it somehow applying to all MKL routines. I'm pasting the full makefile below for completeness, as well as the evironment settings.

--J

.SUFFIXES: $(SUFFIXES) .f90

FC = /usr/bin/ifort

FAST = -O3 -m64 -AVX

FASTT = -O3 -m64 -AVX -openmp

OBJDIR = Obj/

MOD = Mod/

GEO = Geo/

DIFF = Diff/

COMM = Comm/

DGM = Dgm/

SOLVER = Solver/

FLUID = Fluid/

FLASH = Flash/

MKLROOT = /opt/intel/composer_xe_2011_sp1.9.289/mkl/

# MKL =-I$(MKLROOT)/include/intel64/lp64 -I$(MKLROOT)/include -L$(MKLROOT)/lib $(MKLROOT)/lib/libmkl_blas95_lp64.a $(MKLROOT)/lib/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm

MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -openmp -lpthread -lm

# MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread -lm

OBJS = $(OBJDIR)mod_mesh.o $(OBJDIR)mod_initializations.o\\

$(OBJDIR)gas.o $(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o $(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_peng_rob_eos.o $(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_linear_solver.o $(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_comp_matrix.o $(OBJDIR)mod_flash.o\\

$(OBJDIR)mod_comp_fluxes.o $(OBJDIR)mod_comp_flow.o\\

$(OBJDIR)mod_slope_limiter.o\\

$(OBJDIR)mod_time.o $(OBJDIR)mod_diffusion.o

TARGET = CHOMPFRS3D.e

all : $(TARGET)

CHOMPFRS3D.e : $(OBJS)

$(FC) $(FASTT) -module $(MOD) $(MLKLIB) $(MKL) $(OBJDIR)$** -o $@

#1------------------------------------------------------

$(OBJDIR)gas.o : $(OBJDIR)mod_mesh.o $(OBJDIR)mod_initializations.o\\

$(OBJDIR)mod_fluid.o $(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_variables.o $(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_viscosity.o $(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_linear_solver.o $(OBJDIR)mod_comp_matrix.o\\

$(OBJDIR)mod_comp_fluxes.o $(OBJDIR)mod_comp_flow.o\\

$(OBJDIR)mod_diffusion.o $(OBJDIR)mod_time.o gas.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#=======================================================

$(OBJDIR)mod_initializations.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_linear_solver.o\\

$(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_flash.o\\

$(OBJDIR)mod_comp_fluxes.o\\

$(COMM)mod_initializations.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_mesh.o : $(GEO)mod_mesh.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

$(GEO)mod_mesh.f90 : $(GEO)read_mesh.f90\\

$(GEO)comp_dist_vol.f90

#-------------------------------------------------------

$(OBJDIR)mod_fluid.o : $(OBJDIR)mod_mesh.o\\

$(FLUID)mod_fluid.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_peng_rob_eos.o : $(OBJDIR)mod_mesh.o $(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(FLUID)mod_peng_rob_eos.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_init_data.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_variables.o\\

$(FLUID)mod_init_data.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_variables.o : $(OBJDIR)mod_mesh.o\\

$(DGM)mod_variables.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_viscosity.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(FLUID)mod_viscosity.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_flash.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_viscosity.o\\

$(OBJDIR)mod_diffusion.o\\

$(FLASH)mod_flash.f90

$(FC) $(FASTT) $? -c -module $(MOD) -o $@

$(FLASH)mod_flash.f90 : $(FLASH)stability.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash2f.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash3f.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash2f_PR.f90

$(FLASH)mod_flash.f90 : $(FLASH)PressPMV.f90

$(FLASH)mod_flash.f90 : $(FLASH)eos.f90

$(FLASH)mod_flash.f90 : $(FLASH)PressPMV_PR.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_nodes.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_stability.f90

$(FLASH)mod_flash.f90 : $(FLASH)flash_stability_PR.f90

#-------------------------------------------------------

$(OBJDIR)mod_linear_solver.o : $(OBJDIR)mod_mesh.o\\

$(SOLVER)mod_linear_solver.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_slope_limiter.o : $(OBJDIR)mod_mesh.o\\

$(DGM)mod_slope_limiter.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_inv_bk.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(DGM)mod_inv_bk.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_flow.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_slope_limiter.o\\

$(OBJDIR)mod_init_data.o\\

$(DGM)mod_comp_flow.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_time.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(COMM)mod_time.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_fluxes.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_inv_bk.o\\

$(OBJDIR)mod_diffusion.o\\

$(DGM)mod_comp_fluxes.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_diffusion.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_linear_solver.o\\

$(OBJDIR)mod_fluid.o\\

$(OBJDIR)mod_init_data.o\\

$(DIFF)mod_diffusion.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

#-------------------------------------------------------

$(OBJDIR)mod_comp_matrix.o : $(OBJDIR)mod_mesh.o\\

$(OBJDIR)mod_variables.o\\

$(OBJDIR)mod_init_data.o\\

$(OBJDIR)mod_peng_rob_eos.o\\

$(OBJDIR)mod_inv_bk.o\\

$(DGM)mod_comp_matrix.f90

$(FC) $(FAST) $? -c -module $(MOD) -o $@

clean :

@-rm Obj/*.o

@-rm CHOMPFRS3D.e

@-rm Mod/*.mod

------- .profile content below:

export OMP_NUM_THREADS=1

export OMP_MAX_THREADS=1

export TEC_RS_2009=/usr/tecRS_2009_R2

export FORT_FMT_RECL=2000

export DYLD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/compiler/lib:/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib:/opt/intel/Compiler/11.1/088/Frameworks/mkl/lib/em64t:/opt/intel/Compiler/11.1/08$

export LD_LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib

export LIBRARY_PATH=/opt/intel/composer_xe_2011_sp1.9.289/compiler/lib:/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib

export NLSPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/lib/locale/%l_%t/%N

export MANPATH=/opt/intel/composer_xe_2011_sp1.9.289/man/en_US:/opt/local/share/man:/opt/local/man:

export INCLUDE=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export FPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export CPATH=/opt/intel/composer_xe_2011_sp1.9.289/mkl/include

export KMP_AFFINITY=compact,1

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi,

Iam movingthis issueto the MKL forum since your question is about MKL.

Regards,
Annalee
Intel Developer Support

Hi moortgatgmail.com,

Could you please provide us a runable test case (including the code and hardware information)?

from the line, MKL = -I$(MKLROOT)/include -L$(MKLROOT)/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm, you are using MKL sequential library. Theoretically, the serial MKL library should be notinfluence or be influenced by the other OpenMP code.

If the test case are confidential, you can replywith "Private".

Thanks
Ying

Hi,

Regarding your comment:

"This MKL behavior is strange, because I'm using the sequential MKL from
the link advisor, and I set bothOMP_NUM_THREADS=1 andOMP_MAX_THREADS=1
in my .profile file, and use

!$OMP PARALLEL NUM_THREADS(4) only for the loop I want to parallelize."

It seems you prevent the parallelization yourself by setting OMP_NUM_THREADS to 1.
Could you please try setting it to 4 (keeping the link line the same)?
Also I don't think you need that OMP_MAX_THREADS at all.

Best regards,
Vladimir

Leave a Comment

Please sign in to add a comment. Not a member? Join today