Memory usage varies when running an openMP program repeatedly

Memory usage varies when running an openMP program repeatedly

I have found one of my programs (sometimes) reports different memory usage numbers when compiled with openMP.

After a careful inspection of the code (with debugging) I am pretty sure my code does not have "memory leaks" (forget to deallocate something, which can happen easily when using pointer arrays).

Finally I have setup a really minimalistic demo program which shows the same behavior:

  program T1  
      call mamems(__file__, __line__)
  !$OMP PARALLEL DO SCHEDULE(GUIDED) DEFAULT(NONE) PRIVATE(I) 
      do i = 1, 100
        call sub(i)
      enddo
  !$OMP END PARALLEL DO
      call mamems(__file__, __line__)
  end program
  subroutine sub(i)
      s = 0
      do k = 1, 100
        s = s + i * k
      enddo
  end subroutine

(Routine mamems reporting memory usage is not shown here but is included in the attachment. It takes twice as many lines as the example code itself.)

On a quadcore cpu I get (the output shows the current memory usage and the peek value, both in bytes and megabytes)

~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   183627776   115.1   175.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   187826176   115.1   179.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=    53608448    53608448    51.1    51.1 ***
~> T1
T1.f90    3   *** Memsize=    38715392    38715392    36.9    36.9 ***
T1.f90   11   *** Memsize=   120717312   187826176   115.1   179.1 ***

On a dual quadcore cpu with HT (16 threads):

~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   187809792   163.2   179.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   103972864   103972864    99.2    99.2 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   204603392   163.2   195.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   217198592   163.2   207.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   221396992   163.2   211.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   204603392   163.2   195.1 ***
~ >var/T1
T1.f90    3   *** Memsize=    38699008    38699008    36.9    36.9 ***
T1.f90   11   *** Memsize=   171081728   208801792   163.2   199.1 ***

The amount of memory openMP consumes seems to be related to the number of threads, which is not much a surprise.

But why is it varying over repeated program runs?

(With my real program I have found the variation is not regarded to the size of the data set. Though not being constant it usually is something about 60 MB. With a small data set of the same magnitude this means a variation ratio of 1:2. With a huge data set of 2GB the variation is insignificant.)

Markus

AllegatoDimensione
Download t1.f901.52 KB
3 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Tried your reproducer on a Q6600 running Mandriva Linux 2010 and failed to observe the same problem. Maybe you could try to play with /proc/sys/vm/overcommit_memory setting it to 2. I'm afraid it is an Operating System feature, not specific to Intel compiler.

Do you have a specifically set size for stack?
If you are using "unlimited" then you may experience variences due to what else is or has recently run on the system.

Jim Dempsey

www.quickthreadprogramming.com

Accedere per lasciare un commento.