<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Mon, 23 Nov 2009 18:24:39 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/feed" rel="self" type="application/rss+xml" />
    <title>Intel Software Network - <![CDATA[ OpenMP breaks auto-vectorization ]]> feed</title>
    <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;"></div>
We had a case where the poster gave a complete working example on the forum.  In that case, 2 steps were required to fix it:<br />1) upgrade to 11.1<br />2) set -inline-max-size=50  (this value was low enough to stop in-lining of a function with omp parallel)<br /><br />Even though 11.1 is intended to be less aggressive on in-lining than 10.1 and 11.0, in that case it still needed the option to help out.  <br /><br />Do you still get vectorization without -openmp but no vectorization with -openmp, if you turn off in-lining?<br /><br />A complete case would be required to see how you have set it up so that the compiler doesn't have to be concerned about aliasing between dataA and dataB when you don't set -openmp, but is concerned when -openmp is set.  If those were function parameters, appropriate restrict qualifiers would be needed.  It's possible that the analysis might be affected by a change from default static allocation without -openmp to stack allocation with -openmp, or by the compiler correctly stopping in-lining when you set -openmp.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Fri, 26 Jun 2009 06:35:39 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="width: 100%; margin-top: 5px;">
<div style="margin-left:2px;margin-right:2px;">Thanks for the answer.<br /><br />Unfortunately upgrading to 11.1 will take a week or so, because I'm not the admin on the machine with icc. I will try it of course, when the upgrade is done.<br /><br />I tried -inline-max-size=50 with version 11.0 though, but it didn't help. Anyway I am wondering how this could have an effect, as it seems to be about function inlining and my code only has a main.<br /><br />While playing around, I found another way to get it to vectorize. I am using a timer c++ class (which simply wraps the posix high resolution timers for convenience) to measure time. If I remove the usage of this class (and inline the timer code instead into main) vectorization works also with OpenMP.<br /><br />Really strange. The timer code is completely outside the region of interest. Is it possible that OpenMP or vectorization doesn't like object oriented programming and shuts done completely when using it?<br /><br />-- Main.cxx ---<br />#include &lt;stdio.h&gt;<br />#include &lt;stdlib.h&gt;<br /><br />#include &lt;omp.h&gt;<br /><br />#include "Timer.hxx"<br /><br />int main()<br />{<br />#pragma omp parallel<br /> {<br /> printf("OpenMP thread = %i/%i.\n",omp_get_thread_num(),omp_get_num_threads());<br /> }<br /><br /> const int sizeX = 8192;<br /> const int sizeY = 8192;<br /> const int loops = 100;<br /><br /> float* __restrict dataA;<br /> float* __restrict dataB;<br /><br /> int dataSize=sizeof(float)*sizeX*sizeY;<br /><br /> dataA=(float*)malloc(dataSize);<br /> dataB=(float*)malloc(dataSize);<br /><br /> for(int i=0;i&lt;sizeY;i++) {<br /> for(int j=0;j&lt;sizeX;j++) {<br /> dataA[i*sizeX+j]=0;<br /> }<br /> }<br /> dataA[(sizeY/2)*sizeX+(sizeX/2)]=1;<br /><br /> Timer timer;<br /> for(int iLoop=0;iLoop&lt;loops;iLoop++) {<br /> <br /> #pragma omp parallel for<br /> for(int i=1;i&lt;sizeY-1;i++) {<br /> int curIndex=1+i*sizeX;<br /> for(int j=1;j&lt;sizeX-1;j++) {<br /> dataB[curIndex]=0.1*(dataA[curIndex-1]+dataA[curIndex+1]+dataA[curIndex-sizeX]+dataA[curIndex+sizeX])+0.6*dataA[curIndex];<br /> curIndex++;<br /> }<br /> curIndex+=2;<br /> }<br /><br />#pragma omp parallel for<br /> for(int i=1;i&lt;sizeY-1;i++) {<br /> int curIndex=1+i*sizeX;<br /> for(int j=1;j&lt;sizeX-1;j++) {<br /> dataA[curIndex]=0.1*(dataB[curIndex-1]+dataB[curIndex+1]+dataB[curIndex-sizeX]+dataB[curIndex+sizeX])+0.6*dataB[curIndex];<br /> curIndex++;<br /> }<br /> curIndex+=2;<br /> }<br /> }<br /> double duration=timer.get();<br /> fprintf(stderr,"Time = %g s, Performance = %g FLOPS\n",duration,6.*(sizeX-1)*(sizeY-1)*2*loops/duration);<br /><br /> fprintf(stderr,"\n");<br /> for(int i=sizeY/2-5;i&lt;=sizeY/2+5;i++) {<br /> for(int j=sizeX/2-5;j&lt;=sizeX/2+5;j++) {<br /> fprintf(stderr,"%f ",dataA[i*sizeX+j]);<br /> }<br /> fprintf(stderr,"\n");<br /> }<br /><br /> free(dataA);<br /> free(dataB);<br /><br /> return 0;<br />}<br /><br /></div>
</div>
</div>
--- Timer.hxx ---<br />#ifndef om_timer_hxx_<br />#define om_timer_hxx_<br /><br />#include &lt;time.h&gt;<br /><br />class Timer {<br /> public:<br /> Timer() {<br /> reset();<br /> }<br /> void reset() {<br /> clock_gettime(CLOCK_MONOTONIC,&amp;m_Timespec);<br /> }<br /> double get() {<br /> struct timespec endTimespec;<br /> clock_gettime(CLOCK_MONOTONIC,&amp;endTimespec);<br /> return (endTimespec.tv_sec-m_Timespec.tv_sec)+<br /> (endTimespec.tv_nsec-m_Timespec.tv_nsec)*1e-9;<br /> }<br /> double getAndReset() {<br /> struct timespec endTimespec;<br /> clock_gettime(CLOCK_MONOTONIC,&amp;endTimespec);<br /> double result=(endTimespec.tv_sec-m_Timespec.tv_sec)+<br /> (endTimespec.tv_nsec-m_Timespec.tv_nsec)*1e-9;<br /> m_Timespec=endTimespec;<br /> return result;<br /> }<br />private:<br /> struct timespec m_Timespec;<br />};<br /><br />#endif<br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Fri, 26 Jun 2009 07:34:10 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;"></div>
<br />What happens when you place each parallel for into seperate functions then compile with and without IPO?<br /><br />Jim Dempsey<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Fri, 26 Jun 2009 08:48:09 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;"></div>
<br />I forgot to mention, have you tried #pragma vector always on the inner loop?<br /><br />Jim ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Fri, 26 Jun 2009 08:55:41 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin: 0px; height: auto;"></div>
My copies of icpc 11.0/083 and 11.1 for intel64 vectorize both parallel loops, when -ansi-alias is set.  If  you don't set that flag, you are telling the compiler that you may have violated the standards on data type aliasing.<br />You may argue that there is nothing here in the line of aliasing (such as the possibility of  your float data updates over-writing curIndex) which the compiler should be concerned about, but I'll leave it to you if you wish to submit a report on premier.intel.com to make that case.<br /><br />As near as I can find out, some BSD variants use the keyword __restrict, but it would be ignored by icpc.  It's mentioned in Microsoft documentation, but I haven't found a Microsoft or Intel compiler which observes it.  It doesn't appear to make the difference here; the compiler apparently can see that you have malloc'd 2 distinct regions.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Fri, 26 Jun 2009 12:29:00 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ Thanks, Tim. <br /><br />Using 'ansi-alias' seems to be the simplest solution. Though I fear as it makes the optimization working by assuming less type aliasing, it could be easily possible to break it again, e.g. by using ints and floats inside the timer class.<br /><br />Altogether I also get the impression that this should be considered a compiler bug.<br /><br />@Jim: no, '#pragma vector always' doesn't help. What helps though is moving the 'parallel for' section to a separate function and put this function into a '#pragma auto_inline off' section.<br /><br />Best,<br /><br />Oliver<br /><br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Mon, 29 Jun 2009 01:24:22 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;"></div>
<br />Oliver,<br /><br />Good work around. I've found pushing code out of line to work around other OpenMP problems before. If your inner loop has a significant iteration count then the call overhead shouldn't be too bad. I would consider this a problem in the optimization code. If you can submit a simple code sample to premier support then they should be able to identify the problem and fix it.<br /><br />Jim<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Mon, 29 Jun 2009 06:07:15 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin: 0px; height: auto;"></div>
Situations are common where it's not possible to optimize without -ansi-alias.  It would be poor practice to write code which violates the standard which has been in effect for 20 years, and has been the default requirement in all common compilers except Microsoft's for 10.<br />I don't think you're clear on which aspect of this you wish to consider a bug, but you're welcome to file a bug report.  I don't think Intel will adopt consistency with gcc or g++ when it conflicts with Microsoft.<br />A possible feature request might be to fix the vec-report2 so it says "this loop is not vectorizable on account of -no-ansi-alias."<br />I've never seen anyone propose a treatment more like HP's C, so I don't think that would be popular enough to be considered.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Mon, 29 Jun 2009 06:14:17 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="width: 100%; margin-top: 5px;">
<div style="margin-left:2px;margin-right:2px;">Quoting - <a href="/en-us/profile/367365">tim18</a></div>
<div style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"><em> Situations are common where it's not possible to optimize without -ansi-alias.  It would be poor practice to write code which violates the standard which has been in effect for 20 years, and has been the default requirement in all common compilers except Microsoft's for 10.<br />I don't think you're clear on which aspect of this you wish to consider a bug, but you're welcome to file a bug report.  I don't think Intel will adopt consistency with gcc or g++ when it conflicts with Microsoft.<br />A possible feature request might be to fix the vec-report2 so it says "this loop is not vectorizable on account of -no-ansi-alias."<br />I've never seen anyone propose a treatment more like HP's C, so I don't think that would be popular enough to be considered.<br /></em></div>
</div>
</div>
<br />Hi Tim,<br /><br />of course it is debatable what exactly the bug is.<br /><br />What disturbs me is, that the behaviour of the compiler is quite unpredictable. Apparently unrelated pieces of code (the timer object, OpenMP pragmas) break the vectorization, which in the simple case works fine. <br /><br />Why should '-ansi-alias' be required with OpenMP but not without?<br /><br />Ideally I wished the compiler to figure out that these things do not affect if the loop should be assumed vectorizable or not.<br /><br />Best,<br /><br />Oliver<br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Tue, 30 Jun 2009 00:58:34 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
    <item>
      <title>Re: OpenMP breaks auto-vectorization</title>
      <description><![CDATA[ <div style="margin:0px;">
<div id="quote_reply" style="width: 100%; margin-top: 5px;">
<div style="margin-left:2px;margin-right:2px;">Quoting - <a href="/en-us/profile/433588">hpcmango</a></div>
<div style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"><em>
<div style="margin:0px;"></div>
<br />Why should '-ansi-alias' be required with OpenMP but not without?<br /><br /><br /></em></div>
</div>
</div>
In general, violations of -ansi-alias could create race conditions which break OpenMP as well as -parallel.  I don't have high expectations for compilation without -ansi-alias.  <br />I do agree that the compiler should be less obscure about which optimizations are disabled by default, as well as which options are needed for consistency with other compilers.  I put -ansi-alias in icc.cfg and icpc.cfg, so as not to have to remember to set it on command line.<br />The first criterion often seems to be not to miss optimizations which MSVC performs, and vectorization is not one of those.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</link>
      <pubDate>Tue, 30 Jun 2009 06:23:33 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-c-compiler/topic/66549/</guid>
      <category>ISN General</category>
    </item>
  </channel></rss>