Hesitant to renew IVF support service

Hesitant to renew IVF support service

AlGreynolds's picture

I'm hesitant to renew my IVF support service for several reasons:

1. I'm still using version 11 for my day-to-day because of some weird optimization problems in version 12 that causes erroneous code to be generated when vectorization is involved. Unfortunately I've never been able to isolate the problems in small test cases and the full code is large and proprietary.
2. My main application is essentiallystandard Fortran-95 and OpenMP so I don't need all the new Fortran-200X stuff.
3. I'm evaluating the PGI compiler because of its support for GPU programming via the tentative OpenACC standard. It doesn't appear that Intel is going to support this in the near future.

I would renew if I'm wrong about 3 and OpenACC GPU programming is coming within the next year.

Al Greynolds
www.ruda.com

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Tim Prince's picture

OpenACC is a partial interim standard to cover only specific types of GPU. Intel is working on the committee to merge those facilities into future OpenMP. Needless to say, "within a year" is too short a period to expect that follow-on standard implementation.

AlGreynolds's picture

So far myevaluationof thePGI compiler has been a bust. Its performance on my multi-threaded (OpenMP) engineering applicationis very poor. On that same subject, that leads to another reason Imight notbe renewing my Intel support:

4. Running on both my SSE 4.1 and SSE 4.2 capable machines, my applicationwent from being 25% slower using Gfortran 4.5 than IVF 11.1 to 10% faster using Gfortran 4.7 than IVF 11.1 or 12.1 (in all cases I played with optimization and code generation settings to get maximum performance).

Al Greynolds
www.ruda.com

Steve Lionel (Intel)'s picture

Al, can you provide us the program you used to compare? While gfortran continues to improve, our own testing as well as independent tests show it to be, in general, considerably slower than Intel Fortran.

Steve
bmchenry's picture

a good place to start for comparisons of relative speeds of compilers is:
http://www.polyhedron.com/compare0html
from that link the related pages indicate GFortran was only faster than INTEL on two tests (AC & TFFT2) and only on the AMD processor.
Differences were 9.17 v 9.73 (AC) and 133.76 v 135.2 (TFFT2)
Look at the test results for everything else and INTEL has the green GO light!
I find my own tests are consistent.
So with that in mind I'd love to see what you are doing to produce your specified results?

AlGreynolds's picture

My real-world tests are specifically for multi-threaded OpenMP performance which the Polyhedron benchmarks do not cover. They are also using "old" versions of Gfortran. BTW, I too am suprised by my most recent runs that show Gfortran passing IVF.

Al

Steve Lionel (Intel)'s picture

You may be seeing some other effect. Would you be willing to provide us with your tests? We'd like to investigate this.

Steve
AlGreynolds's picture

The code very sensitive/proprietary so the best I can do is the following:

Attached is a published paper I presented at a technical conference in August 2011. Also attached are the results on the same machine for the latest compilers which are actually the Mac OSX versions so they can be directly compared with Figure 6 in the paper (the Windows 7 results are negligibly different). Notice that the speed of the Intel versions has not changed much in the last year but there is a significant jumpingfortran performance. One change in gfortran 4.7 was to put, by default, local arrays on the stack instead of the heap (I think this has always been the default with Intel Fortran).

Al

Attachments: 

Steve Lionel (Intel)'s picture

Thanks - we'll see what we can do with this. Would you please tell me which options you used for each compiler?

Steve
Tim Prince's picture

The ifort option /Qauto (implied by several others, such as /Qopenmp) puts local arrays on the stack (unless /heap-arrays is set).
In my comparisons between ifort and gfortran I always had OpenMP enabled. Compared with current gfortran, ifort frequently depends on the vectorization or prefetch directives to give better performance.

AlGreynolds's picture

Here arethe options I used (besides the particular OpenMP option):

gfortran: -O3 -funroll-loops -march=native -ffast-math -mfpmath=sse -msse4.2
ifort: -fast

I experimented with additional ifort options (e.g. -unroll-aggressive) but no combination produced faster code. Any suggestions?

Al

Tim Prince's picture

You might try something closer to equivalent to your gfortran options: ifort -O2 -assume protect_parens -xSSE4.1 -complex-limited-range. Evidently, if you have complex arithmetic, it's not a fair comparison when you set limited-range in gfortran (it's included in -ffast-math) but not ifort. If you don't have complex, it's not so obvious why gfortran would be faster.
I saw that you set -march-native and made doubly sure by -msse4.2 (I suppose you don't have AVX), but I don't think gfortran loses any Westmere optimizations by enabling sse4.2. (Just in case you're testing a Westmere-like CPU).

AlGreynolds's picture

For these particular results, the algorithm does not use complex arithmetic. Also, the processor is a Westmere so it doesn't have the new AVX.

Al

Login to leave a comment.