Which flavor (IA-32, Itanium, EM64T) of FORTRAN compiler (F90) needs to be used on AMD Opteron for 64-bit application development?
Thanks in advance.
EM64T. This is discussed in the Installation Guide and Release Notes.
Good Day to everybody
I am a CFD engineer and I am a newbi programmer and I use many allocatable arrays in intel visual fortran 11.1.051. My code solves a CFD problem in a good reasonable time when compiled with Fortran Build Environment for App .... IA-32 like this:
ifort /O2 m_text.f90 mgeom.f90 bcond.f90 flux.f90 coeff.f90 momentum.f90 solver.f90 correction.f90 writed.f90 readg.f90 MFS.f90 /exe:Main.exe
Recently I did a test with the Fortran Build Environment for App .... Intel(R) 64, using the same code, same compilation command, and applied to the same CFD problem, and it runs painfully slow (when compared to the 32-bit compilation).
I did this test on my computer that runs under Windows 7 64-bit, with a AMD Turion II M500 Dual-Core. I know that some optimizations might not work with AMD processors, but why my code works fine with a 32 bit compilation and why it gets really slow when compiled with 64 bit?
Am I asking the right question?
Thank in advanced for any answer you could give me.
Processor specific optimizations are not turned on by default. How much of a performance degridation are you see? The longer adresses in 64bit applications can cause a slight slowdown, but in most cases it should not be dramatic.
I would suggest trying your program on other machines and with a newer version of the compiler. The current version is 13.1. 11.1 is no longer supported.
>>...why my code works fine with a 32 bit compilation and why it gets really slow when compiled with 64 bit?
>>Am I asking the right question?
Your question is right but you need to provide more details.
I see that you're compiling a set of source files. What you need to do is profiling and comparing execution times of different parts ( functions / procedures / etc ) of your software on 32-bit and 64-bit platforms. Your statement "...really slow..." is too generic and you need to identify a root cause ( or several causes ) of your performance problem.
>>...I would suggest trying your program on other machines and with a newer version of the compiler. The current version is 18.104.22.168
>>is no longer supported...
Many software companies and teams are using no longer supported compilers, IDEs, software and hardware and it is not so easy to change / upgrade that obsolete stuff. For example, due to strict ISO 8001 requirements in X-Ray imaging industry.
I don't recommend you to do upgrade until you find a root cause of your problem(s) because there is No clear picture of what is going on. It could be an implementation problem(s) ( for example, a poorly implemented code... ) and it will show up again with a newer version of compiler. In that case you will simply waste your time. However, a verification could be done at some point but, as I already mentioned, a root cause is not known yet and you need to understand it first.
PS: Imaging that you have a broken car and you know (!) that it could be fixed (!) for $100 but somebody suggests you to buy a new car for $18,000... Sorry for off topic example.
You could try some other compiler options and see if they work as expected. /O2 is the default optimisation and turns on vector instructions. If you use /O1 this might turn off some options. See if this changes the slow down./Qxhost is another option, if it recognises your processor ?Does turning on/off vector instructions result in different run times.Do you use binary files ? If so what is the record size unit; 1 or 4 bytes; this could be an explaination.Others might be able to suggest some basic sets of compiler options more suited for your processor.
My experience on switching between 32-bit and 64-bit has shown only small run time differences, but I have only used intel processors. The main advantage going from 32-bit to 64-bit is to move from an out of memory solution to an in-memory solution, reducing disk I/O. If you don't need 64-bit or utilise the benefits of more memory, no significant change in run time should be expected (either better or worse)If your test is between ifort Ver 11.1 IA-32 and EM64T, your reported outcome is not expected, but not impossible.
Sergey's recommendation is a good one; find the problem source before changing the compiler. Profile the run time, by either timing stages of the solution or using a profiler. Using well placed SYSTEM_CLOCK calls might be an easy first start, that requires little knowledge of new tools.
Is your program performing a large amount of video I/O? If so, then this is likely not a code performance issue within the domain of IVF, rather it could be an issue between 64-bit app and 32-bit driver/dll on 64-bit O/S.
Run the profiler to get a better picture of what is happening (as others have suggested).
I don't think /QxHost could be recommended for an AMD CPU. If it has a full SSE3 instruction set, you could try /arch:SSE3 if that's relevant to your application, depending on how it was spelled in that compiler.
Did you look into whether the change to 64-bit mode is exceeding a cache or memory size threshold? According to Wikipedia, this CPU has "only 1MB L2 cache" which could indicate that you got lucky with 32-bit mode.
I doubt Turion got a full compatibility testing either from the AMD or Intel side; on such an old CPU you may have to stick with a solution which happens to work. It doesn't seem an obvious choice for CFD.
Try using AMD's CodeAnalyst to profile the application (using event based profiling). On major difference between 32-bit vs 64-bit applications that is often overlooked is the instruction cache demands by code compiled in 64-bit may exceed the instruction cach demands by code compiled in 32-bit. I say may because while instruction byte length (prefix, op, scale, index, base, imm) can be longer for 32-bit, more registers on 64-bit tend to reduce code size (i.e. you win some, and you lose some). CodeAnalyst event based profiling between 32-bit and 64-bit may indicate the cause. Once noted, often a minor rework can regain the performance.