ipo-jobs option for Intel 10.1.015 C++ compiler

ipo-jobs option for Intel 10.1.015 C++ compiler

Hi,

I got the following errors when I compiled a code with the option ipo-jobs4 on a Suse 10 Linux system with 16GB memory and 4 cores:

An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce
routine size.

I read this thread with a similar problem and can't find a solution.

http://software.intel.com/en-us/forums//topic/59014

As the code will be used for a critical task, I wonder if the code still be the same as if I build it sequentially. The size of two different build are same, while diff returns they are different and I didn't see more processes than serial build while building the application.

I also wonder if Intel is going to support this feature for the xild, as it takes very long time to build a library with IPO enabled. Now it is the biggest bottleneck for the whole build procedure after we can compile source codes of a large application in parallel with make.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The ipo-jobs switch can help avoid memory problems like you're seeing, but it's not necessarily the case that the number of jobs should correspond to the number of cores, rather it's about breaking up a large problem into smaller chunks, with a trade-off between time and memory. In other words, it might take longer to run, but if you increase the number of jobs you might avoid the threshold problem. The resulting code should be equivalent, as I understand things. I would suggest you try -ipo-separate to see if that still shows the "threshold was exceeded" message. If it does then that problem will not be helped by any setting of ipo-jobs (ipo-separate essentially sets the number of jobs equal to the number of source files).

It's not surprising that diff would report a difference between two essentially identical executable or object files, if nothing else there may be some time stamps or something like that that are different, I don't remember exactly.

Anyway, I'd recommend -ipo-separate and see how long it takes. Of course, if you can provide a test case, we could look into it a bit more carefully. If it's code you don't want to release publically, you can always file an issue at premier.intel.com.

Let us know what happens. Good luck!

Dale

Itestedwith different combinations of ipo, ipo-separate, with or without ipo-jobs8.The same message appear:

An internal threshold was exceeded: loops may not be vectorized or parallelized. Try to reduce routine size.

The message repeated a lot of times. I wonder if there is away to turn the message off. The memory usage at peak is about 1.5G during the ipo procedure. The message is not shown when the code is compiled with Intel 9.0 compiler on a 32-bits system. This 10.1 compiler is running on an EM64T system.

I also observed there are at most 2 processes (I have more than8 jobs to process) during the ipo procedure when I use -ipo-jobs4 option on a 4-core Linux machine. Is there any dependency issue limit the number of ipo process can not go beyond 2? As a result, we can expect no more than two times faster, which have been confirmed with my test.

I like thenew parallel IPO option, while it would be better if itcould bettertake advantage of multicore (4, 8 ...) computer. Also, it would be nice if this feature is available for building large static library with xild.

Leave a Comment

Please sign in to add a comment. Not a member? Join today