WRF and DAPL with Intel Cluster Studio?

WRF and DAPL with Intel Cluster Studio?

Hi There,I am using the Intel Cluster Studio for Linux (v2011.0.013) to build and run WRF (v3.1.1) on an Infinband Cluster. Theswitch is a: HP BLc Qlogic 4X QDR IB switch (PN: 505958-B21), theHCAs are: HP BLc Qlogic 4X QDR IB Mezz HCA (PN: 583210-B21). We are using the QLogic Infiniband software, which includes OFED.I have built WRF using Intel compiler options and DM_PARALLEL.I have successfully run this WRF using shm:tcp and shm:tmi, but for the dozens of ways I have tried shm:dapl it has failed with a message like this ...INPUT LandUse = "USGS"WRF NUMBER OF TILES = 1WRF NUMBER OF TILES = 1[14] rtc_invalidate error 1114112Assertion failed in file ../../i_rtc_hook.c at line 190: 0internal ABORT - process 14Digging around the internet suggests that this might be a problem with MKL producing threads. So I have tried using the -mt_mpi compiler switch for thread-safety. I have also tried using "-genv OMP_NUM_THREADS 1 -genvI_MPI_PIN_DOMAIN omp" with mpiexec. I have also tried compiling WRF without linking to the MKL libraries. Everything produces the same result. Maybe the DM_PARALLEL WRF is multi-threading?The relevant line in /etc/dat.conf isofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" "".OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 "ib0 0" "" also gives the same result. So, my questions are:1. Is it possible to run WRF with DAPL on such a system, and what do I need to do to make it work?2. If I could make it work, would I expect much better performance than shm:tmi?Thanks for reading!Cheers,Cory.

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In the absence of an expert reply, I'll mention that DAPL is used with certain applications in an effort to scale to hundreds of ranks. However, with WRF, I've heard the hybrid MPI FUNNELED mode is used to get a modest increase in scaling (beyond 256 cores?). I don't know from my own experience, as WRF runs best on my current hardware at 32 ranks.

Best Reply

Hi Cory,

For Qlogic HCAs shm:tmi is the best variant.
Dapl implementation for Qlogic cards is not good enough yet - it's unstable and you'll get worse performance.

So, just use shm:tmi for now with Intel MPI library.

If you are going to use MKL library you need to add '-mt_mpi' compiler option and don't set OMP_NUM_THREADS environment variable. MKL and MPI libraries understand each other and MKL will not create more threads than number of cores available on a node. (You can try different -ppn btw)


Thanks very much for the help!

Leave a Comment

Please sign in to add a comment. Not a member? Join today