OMP: System error #1455: The paging file is too small for this operation to complete

OMP: System error #1455: The paging file is too small for this operation to complete

The title of the thread describes the problem with MKL and I'll provide additional technical details.

37 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

1. On an Ivy Bridge system with 32GB of RAM and a 64-bit WIndows 7 platform set the following values for Windows paging file:

Initial size (MB): 98304
Maximum size (MB): 131072

2. A test project is attached.

3. Recommended Matrix size is 16,384x16,384

4. Set OMP_NUM_THREADS and KMP_NUM_THREADS to 4 ( for both )

5. Try to encrease values of these environment variables by two. The error has to be reproduced when OMP_NUM_THREADS and KMP_NUM_THREADS are set to 16 ( for both ) or higher values.

6. Screenshots will be posted.

7. MKL version is 11.0

Let me know if you have questions and thanks in advance.

Fichiers joints: 

Fichier attachéTaille
Télécharger forttestapp.zip4.9 Ko

[ Screenshot 1 ]

Fichiers joints: 

Fichier attachéTaille
Télécharger forttestapp1.jpg124.21 Ko

[ Screenshot 2 ]

Fichiers joints: 

Fichier attachéTaille
Télécharger forttestapp2.jpg116.61 Ko

[ Screenshot 3 ]

Note: I wonder if the problem could be possibly related to default values of OMP_STACKSIZE or KMP_STACKSIZE?

Fichiers joints: 

Fichier attachéTaille
Télécharger pagingfiletoosmall.jpg189.11 Ko

Hello Sergey,

From the attached picture, it looks the memory (virtual memory) used by the applications is close to limitation. The max size about 130G for system and application almost use that size.

Also from the performance point of view, the virtual memory is far beyond the physical memory (32G) now. It may have much memory swapping, can create some bad performance.

Regards,
Chao

Hi Chao, I've provided example and lots of technical details in order to avoid any talks which I would rate as "speculative". Please try to reproduce the problem. On my side, I'll try to increase VM Max size and don't worry about performance of calculations since this is another issue Not related to the problem. Thanks in advance.

 

Sergey, 

I downloaded the code to have a check.  I did not find any MKL function call there. Did I miss something?  

Regards
Chao 

What about MATMUL Fortran call? Doesn't it use MKL ( parallel version ) indirectly?

Hi Chao, Please take a look at new test results. Thanks.

For all tests:

No of rows N = 16384
No of columns N = 16384

Note: Test from 1 to 8 are with the following Windows paging file settings:

Initial size (MB): 65536
Maximum size (MB): 98304

OMP_STACKSIZE - Default value
KMP_STACKSIZE - Default value

[ Test 1 ]

OMP_NUM_THREADS=4
KMP_NUM_THREADS=4
...
Calculated ( in seconds ): 364.6980
...
Number of CPUs used: 4
Number of Threads used: 4

[ Test 2 ]

OMP_NUM_THREADS=8
KMP_NUM_THREADS=8
...
Calculated ( in seconds ): 331.7810
...
Number of CPUs used: 8
Number of Threads used: 8

[ Test 3 ]

OMP_NUM_THREADS=12
KMP_NUM_THREADS=12
...
Calculated ( in seconds ): 334.0590
...
Number of CPUs used: 8
Number of Threads used: 12

[ Test 4 ]

OMP_NUM_THREADS=16
KMP_NUM_THREADS=16
...
Calculated ( in seconds ): 332.1400
...
Number of CPUs used: 8
Number of Threads used: 16

[ Test 5 ]

OMP_NUM_THREADS=24
KMP_NUM_THREADS=24
...
Calculated ( in seconds ): 331.1880
...
Number of CPUs used: 8
Number of Threads used: 24

[ Test 6 ]

OMP_NUM_THREADS=32
KMP_NUM_THREADS=32
...
Calculated ( in seconds ): 331.8120
...
Number of CPUs used: 8
Number of Threads used: 32

[ Test 7 ]

OMP_NUM_THREADS=40
KMP_NUM_THREADS=40
...
Calculated ( in seconds ): 330.6590
...
Number of CPUs used: 8
Number of Threads used: 40

[ Test 8 ]

OMP_NUM_THREADS=48
KMP_NUM_THREADS=48
...
OMP: Error #136: Cannot create thread.
OMP: System error #1455: The paging file is too small for this operation to complete.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.
...
Number of CPUs used: N/A
Number of Threads used: N/A

[ Test 9 ]

OMP_NUM_THREADS=48
KMP_NUM_THREADS=48

All tests with the following environment variables ( different values ) failed:

OMP_STACKSIZE=512K
KMP_STACKSIZE=512K Note: Test Failed

OMP_STACKSIZE=256K
KMP_STACKSIZE=256K Note: Test Failed

OMP_STACKSIZE=128K
KMP_STACKSIZE=128K Note: Test Failed

OMP_STACKSIZE=64K
KMP_STACKSIZE=64K Note: Test Failed

OMP_STACKSIZE=32K
KMP_STACKSIZE=32K Note: Test Failed

OMP_STACKSIZE=16K
KMP_STACKSIZE=16K Note: Test Failed / 32K used instead

[ Test 10 ]

Note: New Windows paging file settings:

Initial size (MB): 98304
Maximum size (MB): 131072

OMP_NUM_THREADS=48
KMP_NUM_THREADS=48

OMP_STACKSIZE=512K
KMP_STACKSIZE=512K
...
Calculated ( in seconds ): 329.5980
...
Number of CPUs used: 8
Number of Threads used: 48

[ Test 11 ]

OMP_NUM_THREADS=56
KMP_NUM_THREADS=56
...
Calculated ( in seconds ): 331.2350
...
Number of CPUs used: 8
Number of Threads used: 56

[ Test 12 ]

OMP_NUM_THREADS=64
KMP_NUM_THREADS=64
...
OMP: Error #136: Cannot create thread.
OMP: System error #1455: The paging file is too small for this operation to complete.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.
...
Number of CPUs used: N/A
Number of Threads used: N/A

[ Test 13 ]

Note: New Windows paging file settings:

Initial size (MB): 163840
Maximum size (MB): 196608

OMP_NUM_THREADS=64
KMP_NUM_THREADS=64
...
Calculated ( in seconds ): 332.8110
...
Number of CPUs used: 8
Number of Threads used: 64

[ Test 14 ]

OMP_NUM_THREADS=80
KMP_NUM_THREADS=80
...
Calculated ( in seconds ): 328.1630
...
Number of CPUs used: 8
Number of Threads used: 80

[ Test 15 ]

OMP_NUM_THREADS=96
KMP_NUM_THREADS=96
...
OMP: Error #136: Cannot create thread.
OMP: System error #1455: The paging file is too small for this operation to complete.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.
...
Number of CPUs used: N/A
Number of Threads used: N/A

Consolidated results of processing are as follows:

...
Calculated ( in seconds ): 364.6980
Calculated ( in seconds ): 331.7810
Calculated ( in seconds ): 334.0590
Calculated ( in seconds ): 332.1400
Calculated ( in seconds ): 331.1880
Calculated ( in seconds ): 331.8120
Calculated ( in seconds ): 330.6590
Calculated ( in seconds ): 329.5980
Calculated ( in seconds ): 331.2350
Calculated ( in seconds ): 332.8110
Calculated ( in seconds ): 328.1630
...

and, as you can see, there is no any performance impact ( from high values for Virtual Memory ( VM ) ). Actually, only 9GB of memory needed to calculate product of 16384x16384 matricies with MATMUL.

My conclusions are as follows:

- Issue is resolved when size of VM is increased
- Smaller values for OMP_STACKSIZE and KMP_STACKSIZE do not resolve the issue
- Possibly there is a problem with libiomp5md.dll and this is Not related to MKL
- Possibly there is a problem with scalable_malloc or mkl_malloc functions

Sergey,

I have a quick run here, see one similar error report.  MATMUL is the Fortran intrinsic function, not a call to the MKL function, so the problem is not likely to be related to the scalable_malloc/mkl_malloc functions, but be related with the MATMUL function, and this function is auto-paralleled when compiling with /Qparalle witch.

Regards,
Chao

>>...Possibly there is a problem with libiomp5md.dll and this is Not related to MKL...

Chao, Could you inform software engineers responsible for libiomp5md.dll about a possible problem with memory allocation? I've provided lots of technical details.

Hi Sergey,

I will move this thread to the Fortran forum, so more expert there could help. A quick summary for the problem:

When compiling with /Qparallel for the fortran intrinsic function MATMUL, it will report the bellow errors, even with the small matrix:
  OMP: System error #1455: The paging file is too small for this operation to complete

 The test code in the first comment could show the problem.

 

 

Sergey,

Look at your screen shot 3. You circled the commit size of your Fortran app and circled the page file size to show the app used less page file than the total size. You forgot to include the commit sizes for devenv, WLTray, explorer, ... This gets you to ~130.583GB, then you must add the pageble portions of the Services (not shown on the tab). Also not shown in your commit number is the size of the failing allocation.

At least the system is taking a graceful exit in reporting to the application that it does not have the resources to create the thread. Some OS's will choke down and crawl at this point. (no error back to the app, no progress for anything else on the system)

Your experiment is pushing the system to break your app. Well, it did. Go out and buy a 256GB SSD and reserve all of that for your page file.

Jim Dempsey

www.quickthreadprogramming.com

>>...At least the system is taking a graceful exit in reporting to the application...

This is Not true and application hangs and I needed to use Windows Task Manager to end it. There is the test project and if you have a couple of free minutes you could try to reproduce the problem.

>>...I will move this thread to the Fortran forum, so more expert there could help...

Thanks. However, I consider that the problem is related to libiomp5md.dll and memory allocation. MATMUL function works well and I didn't have any issues or problems.

I also would like to stress that only 9GB (!) of memory is needed to calculate product of 16384x16384 matricies with MATMUL. It is Not clear why some function in libiomp5md.dll reserves excessive amount of memory ( see screenshots ).

Sergey,

If you look at your 8 (9) thread screen shot at 27,320,672
And your 16 (17) before crash at 44,125,712
Subtract the 9 from 17 gives 16,805,040 for 8 additional threads
Divide by 8 gives 2,100,630 required per thread.
Your error message said omp could not add a thread when at 130,282,476 (+2,100,630 = 132,383,106) this exceeds your page file size.

Now, if we take 9 x 2,100,630 = 18,905,670 off the total for 9 threads we get 8,415,002KB for heap and program. This is in line with your 9GB estimate.

What is this extra thread, is it one you are spawning or a watchdog of OpenMP?

Jim Dempsey

www.quickthreadprogramming.com

>>Your error message said omp could not add a thread when at 130,282,476 (+2,100,630 = 132,383,106) this exceeds
>>your page file size.

This is correct and the question still remains why does it need so much Committed memory?

>>What is this extra thread, is it one you are spawning or a watchdog of OpenMP?..

This is the thread for the main application.

Sergey

 Can you run VMmap tool and/or Xperf and investigate which library/function(s) is allocating a large amount of memory?

>>

[ Test 14 ]

OMP_NUM_THREADS=80
KMP_NUM_THREADS=80
...
Calculated ( in seconds ): 328.1630
...
Number of CPUs used: 8 <***************
Number of Threads used: 80 <***********

[ Test 15 ]

OMP_NUM_THREADS=96
KMP_NUM_THREADS=96
...
OMP: Error #136: Cannot create thread. <******* ~2GB/thread
OMP: System error #1455: The paging file is too small for this operation to complete.
OMP: Error #178: Function GetExitCodeThread() failed:
OMP: System error #6: The handle is invalid.
...
Number of CPUs used: N/A
Number of Threads used: N/A
<<

Why are you oversubscribing the threads?
Oversubscription is generally counter-productive

Jim Dempsey

www.quickthreadprogramming.com

>>... Can you run VMmap tool and/or Xperf and investigate which library/function(s) is allocating a large amount of memory?
>>...Why are you oversubscribing the threads?

Jim, Iliya,

If you have free time you could spend as much as you can with the test peoject I've attached. I hope that Intel software engineers will investigate why excessive amount of memory is committed. There is nothing else I can do in that case.

>>>If you have free time you could spend as much as you can with the test peoject I've attached>>>

I wish I could.I am still waiting for a new computer:)

>>>Why are you oversubscribing the threads?
Oversubscription is generally counter-productive>>>

Btw a link to interested article about the thread oversubscription :http://blogs.msdn.com/b/visualizeparallel/archive/2009/12/01/oversubscri...

Oversubscription of OpenMP threads was not a concern in that case.

I suppose that OpenMP library is pre-allocating or commiting memory as a function of total threads number hence the out of memory (pagefile.sys) situation.This error could be due to improper task partitioning in terms of allocated memory per thread.

>>...I suppose that OpenMP library is pre-allocating or commiting memory...

I know from the beginning that something is wrong with:

scalable_malloc or scalable_aligned_malloc from tbbmalloc.dll

or

_MALLOC_POOL_INCR or KMP_MALLOC or kmp_malloc or kmpc_malloc from libiomp5md.dll.

Should we continue that discussion? I don't think so because Intel software engineers will take a look at the problem and if they find a problem it will be fixed in the source codes.

Agree with you.

This is a short follow up because there was a typing error in my previous post and I did a correction:

From libiompprof5md.dll To libiomp5md.dll

Sorry about this.

Sergey,

Are you using TBB's scalable malloc within a non-TBB app using OpenMP?

Nothing inherently wrong with this, assuming you know how a scalable malloc works. From your earlier posts, one can derive that each thread requires 2GB of RAM. Since this is occuring at thread startup, it is either requested thread stack, or Thread Local Storage. Once this thread is running, its first scalable malloc allocation is going to acquire a slab of RAM to be used in a manner as a private heap that the thread can allocate from _without_ using a critical section. Actually it is like n x private heaps for varying chunk sizes. Scallable malloc requires more memory than standard heap. Do what you can to reduce the 2GB. As your runtime stats show no improvements when you exceed the number of hardware threads, there is no driving reason for you to be requesting that many threads.

Jim Dempsey

www.quickthreadprogramming.com

>>... there is no driving reason for you to be requesting that many threads...

Jim, Do you understand a term Stress Testing of Software?

Sergey,

Yes, you ran your stress test to the point of running out of available Virtual Memory (due to limitations on the page file). What do you expect to happen under these circumstances? The error message you reported early in this thread is self explanitory:

OMP: Error #136: Cannot create thread.
OMP: System error #1455: The paging file is too small for this operation to complete.

You specified how many threads to create, and due to your 2GB/thread setup, you ran out of page file. If you need to stress more threads - increase your page file (or tweak your program to use less than 2GB/thread).

Jim Dempsey

www.quickthreadprogramming.com

Jim, Your comments Do Not change anything and it is not clear for me what are you going to prove? I also tried to use smaller sizes for OpenMP threads and it doesn't change matters as well.

>>...
>>...Smaller values for OMP_STACKSIZE and KMP_STACKSIZE do not resolve the issue
>>...

>>>Smaller values for OMP_STACKSIZE and KMP_STACKSIZE do not resolve the issue>>>

It yet needs to be confirmed which library is making those allocations.When my new pc will arrive I will check your programme with Xperf.

I'd like to reiterate that I personally Do Not have any problems with that functionality ( a workaround works ) and the thread needs to be considered as the Report with lots of Technical Details that some issue with Intel Runtime library(ies), libiomp5md.dll or libimalloc.dll or tbbmalloc.dll, exists.

Please, Do Not post if you're Not Intel software engineer and if you Do Not have access to source codes of libiomp5md.dll or libimalloc.dll libraries.

Any comments from Intel software engineers?

Thanks in advance.

Connectez-vous pour laisser un commentaire.