'forrtl: severe (41): insufficient virtual memory' with allocated dirived data types and /Qparallel

'forrtl: severe (41): insufficient virtual memory' with allocated dirived data types and /Qparallel


I'm using VS2010 and intel fortran 2011.9.300 on Win7 x64.

I've some derived data types nested in a module, which I allocate later on.
Let's say:

        i0maxlay = 200


        type :: kanten

          integer                         :: layers

          integer   , dimension(i0maxlay) :: lnummer

          integer   , dimension(i0maxlay) :: lmat

          real(i0rk), dimension(i0maxlay) :: lowinkel

          real(i0rk), dimension(i0maxlay) :: ldicke

          real(i0rk)                      :: gesdicke

        end type kanten

        type(kanten),allocatable     :: Kanten_info(:,:)    
and allocate it as follows (in my program int(r3grid_calc(0,0,1)) is around 140):
i0maxk = 200

allocate(Kanten_info( int(r3grid_calc(0,0,1)),i0maxk))

The code is running fine compiled in ia32 debug mode and also fine in ia32 release mode (max. memory usage therfore around 500MB in the whole program. The whole program consists of a lot of more codelines I cannot provide here.). But when I use the /Qparallel option, the 'forrtl: severe (41): insufficient virtual memory' error occurs. The line where the program crashes, is the allocation of Kanten_info.

Does the parallelized code need more memory? But I think I'm far away from the 32bit limit in Windows. Is there something, I coded wrong?

I hope I give you enough information to give me any answer.

Kind regards,

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Meanwhile I had time to test the same compiler options with x64 release and /Qparallel. There, no error occurs. So, maybe it is a 32bit memory limit problem.

As I found here http://msdn.microsoft.com/en-us/library/aa366778.aspx the maximum 32bit memory seems to be linited to 2GB (IMAGE_FILE_LARGE_ADDRESS_AWARE cleared - default) and

4 GB (with IMAGE_FILE_LARGE_ADDRESS_AWARE set) in a WOW64 application.

But nevertheless, why exceeds the memory by using /Qparallel in a way that the program crashes? As you see, I'm not very familiar with what is happening in parallelized code.

As an intermediate conclusion, I will proceed to compile as x64. I'm luckily not limited to 32bits.

However, I would be happy, to get some hints.

Kind regards,

ps. I'm working on a Xeon E5620 with 12GB RAM. Physical memory should not be the problem.

If you have HT enabled, the default number of threads would be 24. Each thread would get a private copy of those arrays where auto-parallel requires it. Did you try setting the number of threads you want, and a corresponding value for KMP_AFFINITY? For example, still assuming HT enabled, to use 1 thread/core:
set KMP_AFFINITY=compact,1,1

If you don't want to deal with the complications of HyperThreading, there is a BIOS setup option to turn it off.
Certainly, you will need X64 mode to take advantage of your RAM. The 32-bit application can get a bit more memory when running under wow64 than on a 32-bit OS, but it's still limited to less than 4GB, none of the extra memory helping with your private data space issues.

Hi TimP,

thanks for your answer.

If I understand you right, auto-parallel makes as many private copies of an array as there are threads allowed. So, if I want to limit the memory usage, I can limit the allowed threads by OMP_NUM_THREADS and KMP_AFFINITY or deactivating HT (I activated it in BIOS for my computer).

Another solution could be to avoid copies of large arrays. In my case it maybe helps to work with small arrays which I put later on into a larger array. But therefore I should know, where auto-parallel breaks the code into different threads. I think, I had to spend more time in learning about parallel computing. But, as it is in most situations, I don't have the time now.

I tested by the way /LARGEADDRESSAWARE with Win32 and /Qparallel and the program runs without crash with the same inputs. /LARGEADDRESSAWARE:no with X64 consequently results in the forrtl: severe (41): insufficient virtual memory error. So, it is definitely the missing amount of memory in my case.

Kind regads,

Standard operating procedure with OpenMP and large arrays is to not make copies the arrays. Instead have thread teams work in different regions of the same array (when possible).

Try to use heap arrays as opposed to stack arrays (at least for the large temp arrays). This way you do not require all threads to have sufficient stack space (consumption of application virtual address space). Then to further reduce memory requirements, for those sections of code that require separate copies of the large temp arrays you can limit the number of threads in the !$OMP clause. But then use additional threads elsewhere.

Jim Dempsey

Compile option /Qpar-report tells you which loops are parallelized. The details of how it's done are hidden intentionally. If you want to control those details, that's the function of OpenMP. It's OK to use OpenMP for important parallel regions and use /Qparallel to check for additional opportunities.

Tanks Jim and thanks TimP,

heap-arrays seems to be an interesting option, but in my special case they alone had not the wished impact.
My first task is to make my code robust and accelerate it later on, having memory handling in mind. So, I think I will try the interesting feature to combine /Qparallel and OpenMP direct control in a later stage of the development. But before that I will have to learn more about OpenMP...

Kind regards,

Leave a Comment

Please sign in to add a comment. Not a member? Join today