High RAM usage in IVF

High RAM usage in IVF

Hi,

I am running a large project with 50 subroutines in it. The problem is three-dimensional with lots of array allocation/deallocation.When I run the project, the iteration ends in 2 to 3 days. During the iteration, RAM usage stuck in maximum of 6 GB RAM ( 98% = 5.375.500 K). How can I decrease the RAM usage effectively ? There are lots of suggestions in forum but I don't want to mix them and make the process to end in longer time. Any suggestion ?

 

Thanks !

14 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

That depends entirely on what your code is trying to achieve. I could not make a suitable generalized recommendation, nor could anyone I think.

For detailed information about the RAM usage you can use RAMMap.exe utility.

Citation :

Andrew Smith a écrit :

That depends entirely on what your code is trying to achieve. I could not make a suitable generalized recommendation, nor could anyone I think.

 

Hi Andrew,

 

My code is an external flow solver (CFD) for 3D compressible flow and what I am trying to do is residual computation for density change (residual iteratively converges to 10E-7) and at the end the outputs are contour plots of pressure coefficient and Mach number for the computational domain.

Do you have multiple threads with large stack allocations?

Citation :

iliyapolak a écrit :

Do you have multiple threads with large stack allocations?

 

I am using object-oriented programming via Fortran 95 language, thus there are some recursive subroutines that have allocated/deallocated members. Can it be because of pre-allocated and not deallocated members ? I was careful, though there can be some not deallocated members.

Regarding fortran object orientation unfortunately I cannot help you.My advise is to run memory profiling tools like VMMap and RAMmap and measure the overall consumption of the memory by your solver.

 

You could also run the program under Intel Inspector XE's memory analysis and track memory growth and look for memory leaks.

Steve

The easiest and quickest solution is to buy more memory.
You indicate you are using 98% memory, so potentially your program memory usage exceeds the memory installed.
I would run task manager and look at the memory usage mesures available, including (what I think is):
Peak Working Set: which is the memory required
Commit Size: which is what you are using
PF Delta: which is teh change in the number of memory pages being used
Alternatively you could install a SSD which would reduce the run time due to virtual memory paging, or rewrite your program to use less memory.
Try to get an estimate of how much memory you program might be using. This exercise may help identify to you where the problem could best be addressed, especially if the software rewrite option is easily identifiable.

If you are going into the virtual memory problem, the idea of changing do loop or array subscript order to localise memory usage can have a huge impact on program run time (clock time). The review can be a good place to start. Just because we have gone from a 32-bit to a 64-bit address space, doesn't mean you can be lazy about memory addressing.

For your employer, the cheapest option will probably be to install more memory.

John

If you would like to investigate memory usage at much detailed level you can use Xperf tool.This tool can track memory usage per thread and per function granularity.

Citation :

Steve Lionel (Intel) a écrit :

You could also run the program under Intel Inspector XE's memory analysis and track memory growth and look for memory leaks.

 

Citation :

iliyapolak a écrit :

If you would like to investigate memory usage at much detailed level you can use Xperf tool.This tool can track memory usage per thread and per function granularity.

Thanks for the suggestions. I will investigate the tools you suggested.

Citation :

John Campbell a écrit :

The easiest and quickest solution is to buy more memory.
You indicate you are using 98% memory, so potentially your program memory usage exceeds the memory installed.

I have 6 GB RAM and will increase it to 12 GB. I hope that it will help for quicker runtime. Thanks.

I have a question: Do you suggest  "!$omp parallel do" line for my do loops ? I have lots of do loops in my code.

Your program likely has nested do loops. The general rule, which may not be specific to your application, is to paralle-ize the outer loop, and vectorize the inner loop.

*** Without being careful of what you are doing you cannot simply plop a"!$omp parallel do" line on a loop, without regard to what is inside the loop, and expect it to run. Well you could expect it to run, but an experienced programmer would have to inspect the code first, and make multi-thread safe coding changes when necessary.

Each thread in a parallel region must run independent of the other threads. Consider what will happen when a parallel loop contains a call to a subroutine with a temporary array in COMMON. Multiple threads will be concurrently relying on the temporary array to be theirs, and your code will break, possibly without you being aware of it.

A second problem, particular to a CFD simulation, is when you partition the problem space (each thread working on a different partition), you have (may have) issues at the partition boundary, where each thread is manipulating one cell outside of its partition (e.g. adding to a force vector). The boundary conditions (may) have to be treated differently, possibly with a critical section, an atomic statement, or multiple accumulators (e.g. two to six force accumulators depending on partitioning). An alternate means is to calculate the shell of the volume separately such that you have no possibility of multiple threads updating the same cell at the same time.

RE: 98% capacity

If you read John Campbell's response on this he mentions the PF delta (Page File delta). This situation refers to your application being permitted (by the O/S) to roam (or graze) over the page file, whereby the actual currently allocated and/or peak allocation never reaches a fraction of the 98% of the page file. Think of tire tracks in a snow covered parking lot. The parking lot may never reach 50% capacity - yet all the parking stalls have tire tracks.

*** In the PF delta situation (application roaming over page file), adding more RAM will not help. You will still reach 98% capacity of page file.

Jim Dempsey

www.quickthreadprogramming.com

During your next 2 or 3 day runtime, I suggest you follow John Campbell's advice:

If you are going into the virtual memory problem, the idea of changing do loop or array subscript order to localise memory usage can have a huge impact on program run time (clock time). The review can be a good place to start.

And review the code for these optimization opportunities. Inner-to-outer, left-to-right

do z=1,NZ
do y=1,NY
do x=1,NX
array(x,y,z) = ...
end do
end do
end do

Inner loop indexes left most subscript, next higher loop indexes second subscript, outer loop indexes right most subscript. A poorly ordered loop may run 10x longer (or more) then an efficiently ordered loop.

Jim Dempsey

www.quickthreadprogramming.com

Connectez-vous pour laisser un commentaire.