Intel's WRF on Phi instructions and validating the output

Intel's WRF on Phi instructions and validating the output

Hi,

I've been following the instructions for testing WRF on Phi here:

http://software.intel.com/en-us/articles/how-to-get-wrf-running-on-the-intelr-xeon-phitm-coprocessor

I've put a comment on issues with the instructions on the article itself, might be worthwhile someone updating the instructions so others don't encounter the same problems.

On validating the output on Phi the instructions above state:

"The ‘DIGITS’ column should contain a high value (>3). If yes, the WRF run is considered valid."

Please see my attached diffout_tag file comparing my output to the reference output. Most of the figures in the 'DIGITS' column are 0, 1 or 2 with some 3s. Nothing >3 as stated in the instructions. There are also lots of errors about variables not found. For someone who is not deeply familiar with WRF but using it for benchmarking purposes on Xeon Phi can you elaborate on this output and whether it shows a decisively erroneous run of WRF on Xeon Phi? If it does can you offer any advice what I can do to investigate what's causing the problem or fix it?

Regards, 

Paul.

AttachmentSize
Download diffout-tag.txt20.15 KB
4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello Paul,

I looked at the attachment, it syas, It did not find NETCDF, Did the workload run? and what result did it print?

--Indraneil

Hi Indraneil,

Thank you for taking a look. I don't understand though - where in the attachment does it say it did not find NETCDF, can you point it out?

There are multiple messages from NETCDF in the file (which I presume would mean it has found NETCDF?) like the following:

NetCDF error: NetCDF: Variable not found
NetCDF error in wrf_io.F90, line 3302
Big difference: HGT_SHAD not found in
wrfout_d01_2001-10-25_03_00_00

These messages seem to say it cannot match certain WRF variables between the two files that are passed to diffwrf? Also in the attachment for some of the WRF variables it shows an analysis like:

TSK 82199 2 0.2864086159E+03 0.2877086356E+03 2 0.2284E+01 0.3965E-01
RAINC 15004 2 0.6952785933E+01 0.7028621160E+01 1 0.3832E+00 0.6628E-01
RAINNC 72728 2 0.4955310608E+01 0.4967326933E+01 2 0.3037E+00 0.2354E+00

I presumed these are variables it was able to match between the two files passed to diffwrf (and if so doesn't that also indicate there is no problem related to NETCDF?).

To answer your question, yes the workload did run with the SUCCESS message mentioned in the instructions. Please see attached files for further info. Also the output of the timing script given in the instructions was as follows:

---
items: 149
max: 2.280940
min: 0.676290
sum: 111.587560
mean: 0.748910
mean/max: 0.328334

I would appreciate if you can help me make sense of the diffwrf output in light of the comments in the instructions about the DIGITS column being > 3

Thanks!

Attachments: 

Hi Indraneil,

I have also run WRF on the host CPU now with the same benchmark and I find the same output from diffwrf as I already mentioned from the Phi output. I have attached the output of diffwrf for both host and phi runs of the benchmark for you to compare. The rms numbers are very similar and in particular the DIGITS columns are identical (i.e. not > 3, which your instructions indicate means not valid output).

Have you tried following your instructions for this benchmark on a CPU and/or on a Phi? Did you see something similar in the diff?

Thanks,

Paul.

Attachments: 

Login to leave a comment.