gcc 5.1 -mfpmath=387 Floating point exception

It appears that the compiler option -mfpmath=387 immediately causes floating point exceptions on Intel architecture processors in cilk_spawn routines compiled with gcc-5.1 cilk.  This seems to be a regression as gcc-4.9 with the cilk patches works fine.  Note that -mfpmath=sse works on 64-bit machines, however, this option is not available for 32-bit Intel machines.  As far as I can tell, most floating point code is affected.  Does anyone know of patches or workarounds for this, especially as this appears to be a show-stopper on 32-bit Intel.

OpenMP Shared Arrays

I have two questions about WRITE/READ operations on shared arrays.
 1) In my program I write a different element of a given shared array at every iteration of an OpenMP-parallelized DO LOOP. The results that I get should be right but I'm just wondering whether this is fine or I should enclose the READ/WRITE section in a CRITICAL block. Then, I also READ elements from a shared array without modifying them and it seems to work. Are these procedures correct?

Memory leak caused or worsened by /Qipo?

I've made a DLL while I compile with /Qipo (Intel C++ Composer XE2015). If I call the constructor and destructor of the main class in it, the memory doesn't get released and after a few calls (32 bit mode) I'm out of memory. However, if I disable /Qipo, there doesn't seem to be a problem at all (I will run it for a longer period tonight, but I let it construct and deconstruct 1024 times earlier tonight and I didn't notice an increase in memory usage).

If I use /Qip mode, the leak is 8 MB per call. With /Qipo it's about 300 MB.

"-collect-with runsa -knob event-config" only works with Basic Performance Tuning Events

For example


works fine. But for many others such as MEM_UNCORE_RETIRED.REMOTE_DRAM

amplxe-cl will give error like:

amplxe: Error: Cannot configure sampling event groups. The collection is terminated.

Could anyone help? Thanks

New Jim Dempsey article: Elusive Algorithms – Parallel Scan


Since I haven't seen a notification of this elsewhere, the ever knowledgeable Jim Dempsey ( just published one of his great technical articles entitled, "Elusive Algorithms – Parallel Scan".

I believe this was an outgrowth of another discussion on the forums, "how to perform inclusive scan in C cilk".


Threading abonnieren