| December 2, 2009 8:00 PM PST | |
This application note was created to help users of WRF version 2.2 and 2.2.1 make use of the Intel Fortran compiler, versions 10 and 11.
For WRF version 3, compare with the other references in http://software.intel.com/en-us/articles/building-the-wrf-with-intel-compilers-on-linux-and-improving-performance-on-intel-architecture/ .
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. Parallel implementations of WRF include support for OpenMP* and for MPI*. For further information, see http://wrf-model.org/index.php†
WRF is in the public domain and source code may be obtained from the WRF project at the URL above. The version with the ARW solver, discussed here, may be downloaded from http://www.mmm.ucar.edu/wrf/users/downloads.html†, as well as test data.
The latest versions of the Intel® Fortran and C++ compilers may be purchased, or evaluation copies requested, from http://www.intel.com/cd/software/products/asmo-na/eng/compilers/284132.htm.
Existing customers with current support can download the latest compilers directly from https://registrationcenter.intel.com/.
- The Intel® Fortran Compiler for Linux;
- Either the Intel C++ Compiler for Linux or the GNU C++ compiler (gcc), version 3.2 or later.
- The network Common Data Form (netCDF)* library, obtainable from the Unidata site at http://www.unidata.ucar.edu/software/netcdf/† See the KB article Building netCDF with the Intel Compilers for instructions on building netCDF.
- An MPI library, such as MPICH* or Intel MPI, if intending to build a distributed memory version.
Hardware:
These instructions have been tested on Intel® Core®2 Duo processors and Intel Itanium® 2 processors running Linux and on Intel® Core®2 Duo processors running Mac OS* X version 10.5.
Software:
These instructions apply to versions 10.1 and 11.0 of the Intel compilers. gcc, if used, must be version 3.2 or later.
1) Set up the Intel® compiler environment, e.g., by the bash shell command "source ifortvars.sh" from the compiler bin directory. Also "source iccvars.sh" if using the Intel C++ compiler. For the 11.0 compiler only, these scripts require an argument "intel64" or "ia32".
2) If using Intel® MPI, set up the environment with "source mpivars.sh" from the MPI bin directory (bin64 directory for Intel 64).
3) Configure (./configure) and build (make –check) netCDF with default options.
(If necessary, see the build instructions at the Unidata netCDF web site.)
4) Set the environment variable NETCDF to point to the top level netCDF directory.
5) Untar the WRF download and configure and build WRF according to the README file:
6) Run ./configure for WRF and select an option that includes ifort.
7) Modify the configure.wrf file as necessary. We recommend:
a) Replace " -mp" by " -fp-model precise"
b) For Intel Itanium-based processors running Linux, set
FCOPTIM = -O3 –fno-alias -ip
c) For Intel® 64 or IA-32 processors running Linux or Mac OS X, set:
FCOPTIM = -O3 –xT –fno-alias -ip for Intel® Core® 2 Duo processors
FCOPTIM = -O3 –xP –fno-alias -ip for any Intel® processor with at least SSE3 support
For the 11.0 compiler, -xssse3 is equivalent to -xT and -xsse3 is equivalent to -xP.
d) For Intel® 64 or IA-32 processors running Linux, set:
FCOPTIM = -O3 –xW –fno-alias -ip for any processor with at least SSE2 support. For the 11.0 compiler, -msse2 is equivalent to –xW and is the default setting for Intel® 64 or IA-32 processors running Linux.
e) If ARCHFLAGS contains the definition –DIFORT_KLUDGE, remove it.
f) Ensure that the base options include –convert big-endian and –align all
g) Verify NETCDFPATH
h) Verify path for MPI if used.
i) Make any additional changes indicated in the "known issues" section.
With these, it should be possible to build all source files with full optimization. However, if desired, certain files, such as module_dm, may be built with FCBASEOPTS and OMP but without FCOPTIM, in order to reduce compilation time and memory requirement.
The -O3 option is available for both Intel® and non-Intel microprocessors but it may result in more optimizations for Intel microprocessors than for non-Intel microprocessors. For more information on processor-specific optimizations, see Intel® compiler options for SSE generation and processor-specific optimizations.
Should not be necessary
Choose one of the WRF test examples, downloading any required data, and build the test, preserving the output, e.g., ./compile em_real > build.log 2>&1
Go to the directory for the chosen test:
cd test/em_real
The "real" data test requires data downloaded from the WRF web site, the "ideal" tests do not.
Untar the data files:
tar –xzvf jan00_wps.tar.gz
Run the initialization code to generate WRF input files:
./real.exe ( or ./ideal.exe )
Increase the shell stack limit:
ulimit –s unlimited (limit stacksize unlimited for C shell)
Run the main simulation:
./wrf.exe or mpirun –n <number of procs> ./wrf.exe to run under MPICH or mpdboot --file=<hostfile>
mpiexec –n <number of procs> ./wrf.exe to run under Intel® MPI.
See the README_TEST_CASES file for the ideal test cases.
The utility <install dir>/external/io_netcdf/diffwrf may be used to compare an output file, such as
< install dir>/test/em_real/wrfout_d01_2000-01-24_12:00:00,
to a reference version. See the WRF website for further details.
Performance
Threading with OpenMP
On IA-64: "fortcom: Warning: Optimization suppressed due to excessive resource requirements"
For certain WRF configurations, typically involving RSL, the compiler may scale back optimizations to limit the memory requirement and compile time. If you are building WRF on a system with plenty of memory, say 8 GB, you may use the option –override-limits to ask the compiler to continue the compilation without reducing the optimization level. When building with OpenMP, the file solve_em.f90 should be compiled with –override-limits, whatever the optimization level.
On Intel 64: "Fatal compilation error: Out of memory asking for ……."
The compiler for Intel 64 is a 32 bit executable and can access a maximum of 4 GB of memory. For certain WRF configurations, typically involving RSL, the compiler may exhaust the available memory for one or two files when compiled with maximum optimization. This may occur for additional files if less than 4GB total (physical + virtual) memory is available, or on IA-32. In version 10.1 of the compiler only, the internal switch
–switch fe_use_rtl_copy_arg_inout may be used to reduce the memory requirement. If warning messages such as "An internal threshold was exceeded" are seen, the additional option –mP2OPT_vec_xform_level=103 may be used to preserve optimization levels.
The version 11.0 compiler for Intel 64 is a native 64 bit executable and is not subject to the 4GB limit on address space. The compiler may still exceed internal limits that are intended to limit memory usage and reduce compile time; this may or may not be accompanied by a warning message. These limits may be avoided by the switch -override-limits in the 11.0 and later compilers. It is strongly recommended to compile the file solve_em.f90 using the switch –override-limits. On a system with plenty of memory, -override-limits may be included in FCOPTIM.
Large memory use or very long compile times for module_configure.f90 may indicate that the definition
–DIFORT_KLUDGE has not been removed from the configuration file as described above.
Please report any problems building WRF that are not described here to Intel Premier Support at https://premier.intel.com.
† This link will take you off of the Intel Web site. Intel does not control the content of the destination Web Site.
This article applies to: Intel® C++ Compiler for Linux* Knowledge Base, Intel® C++ Compiler for Mac OS X* Knowledge Base, Intel® Fortran Compiler for Linux* Knowledge Base, Intel® Fortran Compiler for Mac OS X* Knowledge Base
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (11) 
| May 12, 2009 7:56 AM PDT
Nick Witcraft |
I was having trouble running larger domains across the nodes. I figured out that I needed to pass the unlimit command to the nodes... time mpiexec -l -n 16 /bin/csh -c "limit stacksize unlimited ; ./wrf.exe" </dev/null |
| November 26, 2009 5:55 AM PST
Ken | I have followed this instruction to setup the para. to compile wrf.exe, all compilations went fine and wrf.exe sucessfully made, but when runing it with openmpi, wrf.exe would crash in a short runing time ,say it 2 days in wrf's namelist, with error message about segmentation fault or out of bounds or something like that. However, this isn't a case when the wrf.exe was compiled with PGI fortran + gcc . Any solution or clue to fix this problem is appreciated. Thanks in advance |
| December 3, 2009 10:14 AM PST
Jennifer Jiang (Intel)
|
About the crash, need to narrow it done more. Try with only one thread to see if it's related to OpenMP libs. So on a shell window, run "export OMP_NUM_THREADS=1", then run wrf.exe. btw. for such issues, it's better to post it to the Forums 1. http://software.intel.com/en-us/forums/intel-c-compiler --- for C++ issues 2. http://software.intel.com/en-us/forums/intel-fortran-compile..... -mac-os-x/ --- for Fortran Linux/Mac OS issues. |
| August 11, 2010 3:28 AM PDT
Ana |
Hi, We tried compiling WRF 3.2 in several machines (amd quad core, intel dual core, all 64 bits, IA64) with Linux Intel 11.0 and 11.1 and results with openMP are wrong. The model compiles and runs, but the numbers are not correct (unstable, "pixelized"). We compared results with Gfortran smpar, pgf90 smpar and intel in serial and dmpar, which give equal results. We also compiled smpar without the the flag -fp-model precise, and results are "less" wrong, but still significantly different from the correct ones. What can be done? Thanks. |
| December 1, 2010 10:39 AM PST
george |
Hi, I had everything compiled with ifort and icc (even mpich). The real.exe was executed succesfully with "mpirun -np" but the wrf.exe stops after the first wrfout gave a segmentation fault about "-p4pg" and "-p4wd". Any ideas? Thanks. |
| February 28, 2011 8:12 AM PST
Kotroni Vassiliki |
we have been running wrf on amd (opteron) processors with pgroup compiler for a long period. Now we are trying to move to intel. So we are using the latest version of intel, we compiuled mpich2 also with ifort, and wrf has been compiled successfully but the model crashes just after writting the initial wrfout and before being able to give any timestep. Is there a known problem with amd (opteron or phanom) with wrf/intel execution? For instance we were able to run mm5 (parallel) compiled with intel in the same processors without any problem. Any ideas? thank you in advance V. Kotroni National Observatory of Athens Greece |
| May 6, 2011 6:19 PM PDT
parker singapore |
You completed several good points there. I did a search on the theme and found most persons will consent with your blog. |
| May 29, 2011 3:51 PM PDT
Mikhail Shiryaev
|
To compile on Ubuntu (10.04_x64) need add to CFLAGS_LOCAL and FCOPTIM "-heap-arrays" to avoid the segmentation fault. Sorry about my not good english. |
| June 28, 2011 12:03 AM PDT
salmontres
|
Hi Mikhail, Do you know if you have to use the -heap-arrays option on CentOS as well? I'm getting seg faults after successfully compiling WRF 3.3 for smpar and dmpar, and I have no idea what could be the cause. |
| August 1, 2011 7:37 AM PDT
Mikhail Shiryaev
|
Sorry but I don't use another linux distributives. This solution (-heap-arrays) i found on http://forum.wrfforum.com/viewtopic.php?f=6&t=1625&start=0 |
Trackbacks (1)
- Estuarine » Blog Archive » WRF 3.0.1.1 Install on debian 4.0 with Intel Compiler
September 23, 2010 8:21 AM PDT




anamika_shrivastava.mcaredi...
45
with best regard.