/MP option not working

/MP option not working

I am using Intel C++ Composer XE 2013 SP1 Update 1 with Microsoft Visual Studio 2010. I have a solution with a single project. This project has two large .c source files which take a long time to compile each. When I attempt to use the /MP option, the two source files appear to compile sequentially rather than in parallel. When observing the task manager, I see only one instance of mcpcom.exe, so it seems to confirm that I am compiling on only one core. Any suggestions on how to get parallel compilation of the source files?

Thanks,
Allan

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Allan,

             I have tested a sample application using the /MP option with the compiler version 14.0.1.139 Build 20131008, I could see multiple instances of mcpcom.exe being triggered. I have used the Process Monitor* to capture this data for mcpcom.exe [ Please see the attached screenshot]. If you are not seeing the multiple instances of mcpcom.exe with Process Monitor* [ It's bit difficult to count the number of mcpcom.exe instances created using task manager. ]. If you just see one instance, then please do pass me your testcase/project and i would further investigate on this.

I have used VS2012.

Regards,

Sukruth H V

 

 

Attachments: 

AttachmentSize
Downloadimage/png proc_monitor.PNG52.97 KB

I am using MSVC 2013 + the latest ICC.

I must confirm that for some projects /MP doesn't work.

I have submitted the bug report to premier site, and the bug report was reproducible and confirmed.

you can download my fork of wxWidgrets @ github under name "vdm113/wxWidgets-ICC-patch". The fist dependencies like bundled zlib and so on utilizes all my 4 cores, but after short compile, sub-project "core" utilizes only one CPU core, i.e. it is bug in compile driver.

PS: I have intentionally not mention exact URL to github, in order to not hit spam filter, which is really dumb at Intel's forums web site...

--
With best regards,
VooDooMan
-
If you find my post helpful, please rate it and/or select it as a best answer where applies. Thank you.

Thanks, guys. I'm still not having any success and compilation continues to be done sequentially. I have attached a test solution for your reference. Note that this uses the Intel Performance Primitives. The source files contain meaningless code, but they are designed to take about 45 seconds each to compile.

Thanks,

Allan

Attachments: 

AttachmentSize
Downloadapplication/zip Test.zip7.32 MB

I can reproduce on my machine after insall the latest IPP .

Sukruth ,have you observed the same thing ? i can also only find one mcpcom.exe process in task manager.

Best ,qiaoqiaomin

Hello ,

I can confirm that this IPP code could compile faster when switching to VC++ mode ,and there are only one mcpcom.exe process in task manager when using the Intel Compiler mode.

 

Best ,qiaoqiaomin

Hi Allan & VooDooMan,

                                     Thanks for the testcase, I was able to reproduce the issue. Infact i can see 2 mcpcom.exe is getting created but not in parallel, rather in sequential way, which is *NOT* an intended behaviour with /MP option. I would raise this issue with our engineering team and would get back to you with an update soon.

Regards,

Sukruth H V

Hi Sukruth,

Exactly! in the process explorer from sysinternals/MS, I see 2 mcpcom.exe, one looks like a master (a compile driver?) and its child process in the tree hierarchy that looks like a worker (compiler), since the latter it is eating 25% of CPU (I have 4 cores). Therefore, the number of "workers" should be 4, but there is only one.

TIA!

--
With best regards,
VooDooMan
-
If you find my post helpful, please rate it and/or select it as a best answer where applies. Thank you.

Thank you, everyone. Like Qiaoqiaomin, I have determined that using the VC++ compiler instead of the Intel compiler works as intended with the /MP switch too. I will use it until Intel engineering can provide a fix.
Best regards,
Allan 

Hi VooDooMan,

                        The compiler driver is "icl.exe" and not the "mcpcom.exe". I can see icl.exe also in the task manager. I would update this thread as soon as i get back the communication from engineering team.

Regards,

Sukruth H V

Hi Allan & VooDooMan,

                       I have received an reply from our engineering team and they had communicated with us that this is an expectable behaviour because of the following reason :-

The behavior of the compiler with /MP is dependent on a number of heuristics that determine the number of mcpcoms to be generated in parallel.

  - amount of memory available on the system
  - number of cores on the system
  - number of source files passed to the compilation

But, If you are seeing lots of difference in timing between ICL and CL due to addition of "/MP" option, Please do let me know and i can work on the same. Also please do provide the testcase to justify the same.

Regards,

Sukruth H V 
 

Hi Sukruth,

I am disappointed in the response from engineering.

Attached is another test case with 10 source file for them to review. It is clear to me that the heuristics invoked by /MP are incorrect or poorly conceived.

The attached project takes 7 minutes to build using the Intel compiler. It appears to be compiled in a manner in which either only one source file and at most two source files are compiled in parallel, so much of the compilation is occurring sequentially.

For comparison, the attached project takes only 17 seconds using the Microsoft compiler.

My workstation has 2 Intel Xeon E5-2687W CPUs @ 3.10 GHz (16 cores/32 logical processors) and 128 GB of RAM.

Thanks,

Allan

 

Attachments: 

AttachmentSize
Downloadapplication/zip Test_1.zip6.54 MB

Hi Allan,

               Thanks for the testcase. I was able to reproduce your issue and would get back to you soon. I would further communicate with our engineering team.

Regards,

Sukruth HV

I reported this issue like 1 year ago. The heuristics are rather suboptimal. Now I have new machine (Haswell, Core i7, 4 cores, 8 logical CPUs Win 8.1 x64, 16 GiB RAM, SSD hard drive), and I have CPU load like 15%, LED diode of HDD is mostly dark with flashes like for 20 ms every 1 second, and only one compiler instance at a time.

My suggestion is to treat /MP like MSVC has. As many compiler instances as number of logical CPU count. If user would be dissatisfied, they can specify /MP2 to have only 2 instances. This is the way MSVC does it. Heuristics is fine, but not in this case.

--
With best regards,
VooDooMan
-
If you find my post helpful, please rate it and/or select it as a best answer where applies. Thank you.

Has this issue been resolved? I am trying to compile an open source project (QuantLib) with the /MP flag and it is excruciatingly slow.  It has hundreds of files and they are all being compiled sequentially despite the /MP flag.  I am working on a 24 cores machine, 72 GB of RAM.  I only see one core, at the most two busy at any given time... 

More than 1 year has passed since the original post on this thread - so I would hope the issue has been understood and addressed somehow.

I am using Intel C++ 15.0 with VS 2013.

Regards

Max G.

Unfortunately, I never found a direct Intel-only solution. Instead, I resorted to using the Visual C++ compiler instead of the Intel C++ compiler, while still using the Intel C++ Linker and Intel libraries. It created a few hassles but the parallel compilation works incredibly fast and properly. For the type of code I was dealing with, the speed differences between the compiled executables from each compiler were roughly the same.

I only recently transitioned to using Intel C++ 15.0 with VS 2013, and have not retested with this configuration. Based on your experience, it tells me that the problem was never addressed by Intel.

Allan

Thank you Allan.  This is a very said state of affairs as we were hoping to use the compiler's vectorization capabilities, the times required for sequential compilation are prohibitive.

I wonder if anybody at Intel can help by disclosing the heuristics used for the parallel compilation ?  Or if anybody can share their knowledge on the subject ?

I am willing to change our projects to accomodate intel's heuristics for /MP, if I only knew what to change...

Regards

Max G.

Found a workaround!  At least for my case, but problably widely applicable. 

In a fortran related thread someone mentioned that the Fortran intel compiler, at some point in time, ignored the /MP flag if it was not the first flag on the command line.

I figured I would check if the C++ intel compiler had a similar problem.  I checked my command line in the project properties.  The /MP flag appeared after the /Yu flag in my case (the /Yu flag is used for precompiled headers, though I am not sure this matters).  So I opened my vcxproj file with an editor and and moved the XML entity responsible for the /MP (<MultiProcessorCompilation>true</MultiProcessorCompilation>) to the very first position.  I verified that int the command line, as shown by the project properties, the /MP flag was now the first flag.

And that did the job...  I am not very happy with having to keep checking vcsproj and props files manually with the editor, but I can work with it.  I hope others can make use of this workaround and that Intel will eventually fix this nasty bug.

Take care

Max G.

Hi,

     I could see that with the latest 15.0 Compiler update, the build time with Intel compiler is almost the same as Microsoft* compiler for the above test case. Please do check and let me know.

Regards,

Sukruth H V

 

This issue is still present in for the following setup:

  • Intel Parallel Studio XE 2016 Update 3
  • Microsoft Visual Studio 2015 Update 2

The /MP option doesn't have any effect. Nor setting the /MP4 option manually in General -> Command Line of project configuration page.

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today