Memory usage varies when running an openMP program repeatedly

Memory usage varies when running an openMP program repeatedly

I have found one of my programs (sometimes) reports different memory usage numbers when compiled with openMP.

After a careful inspection of the code (with debugging) I am pretty sure my code does not have "memory leaks" (forget to deallocate something, which can happen easily when using pointer arrays).

Finally I have setup a really minimalistic demo program which shows the same behavior:

  program T1  
      call mamems(__file__, __line__)
  !$OMP PARALLEL DO SCHEDULE(GUIDED) DEFAULT(NONE) PRIVATE(I) 
      do i = 1, 100
        call sub(i)
      enddo
  !$OMP END PARALLEL DO
      call mamems(__file__, __line__)
  end program
  subroutine sub(i)
      s = 0
      do k = 1, 100
        s = s + i * k
      enddo
  end subroutine

(Routine mamems reporting memory usage is not shown here but is included in the attachment. It takes twice as many lines as the example code itself.)

On a quadcore cpu I get (the output shows the current memory usage and the peek value, both in bytes and megabytes)

~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   183627776   115.1   175.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   187826176   115.1   179.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   187826176   115.1   179.1 ***

On a dual quadcore cpu with HT (16 threads):

~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   187809792   163.2   179.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   204603392   163.2   195.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   217198592   163.2   207.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   221396992   163.2   211.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   204603392   163.2   195.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   208801792   163.2   199.1 ***

The amount of memory openMP consumes seems to be related to the number of threads, which is not much a surprise.

But why is it varying over repeated program runs?

(With my real program I have found the variation is not regarded to the size of the data set. Though not being constant it usually is something about 60 MB. With a small data set of the same magnitude this means a variation ratio of 1:2. With a huge data set of 2GB the variation is insignificant.)

Markus

AttachmentSize
Downloadapplication/octet-stream t1.f901.52 KB
3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Tried your reproducer on a Q6600 running Mandriva Linux 2010 and failed to observe the same problem. Maybe you could try to play with /proc/sys/vm/overcommit_memory setting it to 2. I'm afraid it is an Operating System feature, not specific to Intel compiler.

Do you have a specifically set size for stack?
If you are using "unlimited" then you may experience variences due to what else is or has recently run on the system.

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today