Thanks for reading. I'm building my fortran application on the different platforms OSX/Win/Linux using ifort. On a MacBookPro 8,2 with 2.4 GHz Core i7 (I7-2760QM) processor (quad-core), I compile and benchmark my fortran application:
1/ in the host OS (latest OSX, ifort 13.0.1) with "-O3"
2/ in a VMWare + Windows7 + ifort 12.0.5 with "Maximum speed and other optimizations" (i.e. O3 again) and afaik no settings that would penalize speed.
Problem: the application builds fine in the Win VM and produces correct results, but is 2.5x slower.
It uses only 1 thread, and in both situations I've monitored that CPU usage shows indeed 1 core at 100% during runtime. Memory usage is low and disk I/O almost nonexistent. Playing with compiler settings (e.g. explicitly setting instruction set to SSE3 or SSE4; AVX won't execute) didn't make a difference. I found a small speedup (5-10%) when configuring VMWare itself in the CPU settings for the VM to "enable hypervisor applications by providing support for Intel VT-x/EPT inside this VM" - whatever that means - but clearly it's nowhere near the 2.5x performance gap.
I duplicated the entire setup on an iMac with a similar processor and got about the same results.
I don't think the difference in ifort version (12 vs 13) will make much of a difference, but I can try this if requested.
So ... What is going on? Am I making some elementary beginner's mistake in working with VMs? Don't people always say you should get a virtualization penalty of only a few %? Is ifort unable to "see" the CPU through the VM and apply optimizations? How do I get the VM setup to exploit the hardware properly? And would the 2.5x penalty I have now carry over when my users run the Win executables on their own (native) Win systems?
Thanks for your help ...