<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 08 Nov 2009 13:33:10 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-c-compiler-for-mac-os-x-kb/type/performance-and-optimization/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles feed</title>
    <link>http://software.intel.com/en-us/articles/intel-c-compiler-for-mac-os-x-kb/performance-and-optimization/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Performance Tools for Software Developers - Loop blocking</title>
      <description><![CDATA[ <p><b>Loop blocking</b> is a combination of strip mining and loop interchange to enhance reuse of local data. It helps the nested loops that manipulate arrays and are too large to fit into the cache. The loop blocking allows reuse of the arrays by transforming the loops such that the transformed loops manipulate array strips that fit into the cache. In effect, a blocked loop uses array elements in sections that are optimally sized to fit in the cache.</p>
<p> </p>
<p>Use cache <b>blocking</b> to arrange a <b>loop</b> so it will perform as many computations as possible on data already residing in cache. (The next <b>block</b> of data is not read into cache until computations using the first <b>block</b> are finished.)</p>
<p>The loop blocking optimization is part of HLO phase in Intel compiler and is available when using compiler option <span style="mso-bidi-font-family: 'Courier New'; mso-ansi-language: EN;" lang="EN">-O3</span>. The compiler uses default heuristics for loop blocking. But you may also use /Qopt-block-factor:n in Windows or -opt-block-factor:n in Linux to specify loop blocking factor.</p>
<p><b>Data reuse:</b></p>
<p>Data reuse is important to understand blocking. There are two types of data reuse associated with loop blocking:</p>
<ul>
<li>Spatial reuse </li>
<li>Temporal reuse</li>
</ul>
<p> </p>
<p><b>Spatial reuse</b></p>
<p>Spatial reuse uses data that was encached as a result of fetching another piece of data from memory. The data is fetched one cache lines at a time. This is 64 bytes for Intel(R) Core2 processors. If the requested data is located at the beginning of the cache line (aligned data), and the rest of the cache line contains subsequent array elements then for float array, this means the requested element and the seven following elements are cached on each fetch after the first. If any of these seven elements could then be used on any subsequent iterations of the loop, the loop would be exploiting spatial reuse. For loops with strides greater than one, spatial reuse can still occur. However, the cache lines contain fewer usable elements.</p>
<p><b>Temporal reuse</b></p>
<p>Temporal reuse uses the same data item in more than one iteration of the loop. If the loop uses the same element in subsequent loop iterations then loop exhibits temporal reuse in the context of the loop. The blocking exploits spatial reuse by ensuring that once fetched, cache lines are not overwritten until their spatial reuse is exhausted.</p>
<p><b>Example 1: Simple Loop Blocking</b></p>
<p>The following example demonstrates the simple loop blocking. The <b>loop blocking</b> allows arrays A and B to be <b>blocked</b> into smaller rectangular chunks so that the total combined size of two <b>blocked</b> (A and B) chunks is smaller than cache size, which can improve data reuse.</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: 9.5pt; color: black; font-family: Arial;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// before_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" before_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 8000</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>a[i][j] = a[i][j] + b[j][i]; <span style="color: green;">//Adds two matrices</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"><span style="font-size: small;"><span style="font-family: Times New Roman;">The above code is modified below to enhance reuse of the cached data:</span></span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// after_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" after_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 8000</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> BS 16<span style="mso-spacerun: yes;"> </span><span style="color: green;">//Block size is selected as the loop-blocking factor. </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, ii, jj;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i+=BS) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j+=BS)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(ii=i; ii&lt;i+BS; ii++)<span style="color: green;">//outer loop</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(jj=j; jj&lt;j+BS; jj++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{<span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; text-indent: 0.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">//Array B experiences one cache miss</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; text-indent: 0.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">//for every iteration of outer loop</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 5;"> </span>a[ii][jj] = a[ii][jj] + b[jj][ii];<span style="mso-tab-count: 5;"> </span><span style="mso-tab-count: 1;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 9.5pt; color: black; font-family: Arial; mso-ansi-language: EN;" lang="EN"> </span></p>
<p><b>Example 2: Complex Blocking</b></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// matrixMul.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" matrixMul.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 800</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> C[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>matmul(C, A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, k;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(k=0; k &lt; MAX; k++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT">{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 5;"> </span>c[i][j] = c[i][j] + a[i][k] * b[k][j]; </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: small;"><span style="font-family: Times New Roman;"><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;">The above code is modified below to enhance </span><span style="color: black; mso-bidi-font-family: Arial; mso-ansi-language: EN; mso-bidi-font-size: 9.5pt;" lang="EN">spatial</span><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"> and </span><span style="color: black; mso-bidi-font-family: Arial; mso-ansi-language: EN; mso-bidi-font-size: 9.5pt;" lang="EN">temporal</span><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"> reuse of the cached data for array a, b and c:</span></span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// matrixMulBlk.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" matrixMulBlk.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 800</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> BS 16<span style="mso-spacerun: yes;"> </span><span style="color: green;">//Block size is selected as the loop-blocking factor. </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> C[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>matmul(C, A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, k, jj, kk;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j += BS) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(k=0; k&lt;MAX; k += BS)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(i=0; i &lt; MAX; i++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="color: blue;">for</span>(kk=k; kk&lt;k+BS; kk++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 1.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>for</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">(jj=j; jj&lt;j+BS; jj++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{<span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>c[i][jj] = (c[</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT">i][jj] + a[i][kk] * b[kk][jj]); </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<!--CTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dt--> ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking</link>
      <pubDate>Mon, 13 Jul 2009 15:36:15 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>Performance Tools for Software Developers - Auto parallelization and  /Qpar-threshold</title>
      <description><![CDATA[ <!--CTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dt--> 
<table border="0" cellpadding="0" cellspacing="15">
<tbody>
<tr>
<td class="bodycopy">
<p>The auto-parallelization feature of the Intel C++ Compiler automatically translates serial portions of the input program into semantically equivalent multithreaded code. Automatic parallelization determines the loops that are good work sharing candidates, performs the dataflow analysis to verify correct parallel execution, and partitions the data for threaded code generation as is needed in programming with OpenMP directives. The OpenMP and Auto-parallelization applications provide the performance gains from shared memory on multiprocessor systems, IA-32, Intel 64 and Itanium processors.</p>
<p>The following table lists the options that enable Auto-parallelization:</p>
<blockquote><b>/Qparallel:</b><br />Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. <br /><br /><b>/Qpar-threshold:n</b><br />This option sets a threshold for the auto-parallelization of loops based on the probability of profitable execution of the loop in parallel. To use this option, you must also specify -parallel (Linux and Mac OS X) or /Qparallel (Windows). The default is /Qpar-threshold:100.</blockquote>
<p>This option is useful for loops whose computation work volume cannot be determined at compile-time. The threshold is usually relevant when the loop trip count is unknown at compile-time.</p>
<p>The compiler applies a heuristic that tries to balance the overhead of creating multiple threads versus the amount of work available to be shared amongst the threads.</p>
<p>The n is an integer whose value is the threshold for the auto-parallelization of loops. Possible values are 0 through 100. If <i>n</i> is 0, loops get auto-parallelized always, regardless of computation work volume. If <i>n</i> is 100, loops get auto-parallelized when performance gains are predicted based on the compiler analysis data. Loops get auto-parallelized only if profitable parallel execution is almost certain. The intermediate 1 to 99 values represent the percentage probability for profitable speed-up. For example, <i>n</i>=50 directs the compiler to parallelize only if there is a 50% probability of the code speeding up if executed in parallel.</p>
<p>Also, to be "100%" sure that a loop will benefit from parallelization, the compiler needs to know the iteration count at compile time. For a "99%" or lower threshold, knowing the iteration count at compile time is not a requirement.</p>
<p>This leads to a big difference in the number of loops parallelized at 99% compared to 100%. For many apps, 99% is a better setting, but for some apps with a lot of short loops, 99% will slow them down.</p>
<p>The following example, int_sin.c, does not auto parallelize when we use /Qpar-threshold:100 using command line below :</p>
<blockquote>C: &gt;icl -c /Qparallel /Qpar-report3 /Qpar-threshold:100 int_sin.cquote&gt;
<p>If we use /Qpar-threshold:99 then it is parallelized.</p>
<p><b>Example:</b></p>
<p class="whs23" style="MARGIN: auto 0in 0pt"><b style="mso-bidi-font-weight: normal"></b></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// int_sin.c</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Intel C++ compiler sample program</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;stdlib.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;mathimf.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Function to be integrated</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Define and prototype it here</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// | sin(x) |</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#define</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">INTEG_FUNC(x) fabs(sin(x))</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Prototype timing function</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">double</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">dclock( <span style="COLOR: blue">void</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">int</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">main( <span style="COLOR: blue">void</span>)</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Loop counters and number of interior points</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">unsigned</span><span style="COLOR: blue">int</span> i, j, N;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Stepsize, independent variable x, and accumulated sum</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> step, x_i, sum;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Timing variables for evaluation </span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> start, finish, duration, clock_t;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Start integral from</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> interval_begin = 0.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-ali gn: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Complete integral at</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> interval_end = 2.0 * 3.141592653589793238;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Start timing for the entire application</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">start = clock();</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Number of | Computed Integral | "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Interior Points | | "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">for</span> (j=2;j&lt;10;j++)</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">"------------------------------------- "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Compute the number of (internal rectangles + 1)</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">N = 1 &lt;&lt; j;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Compute stepsize for N-1 internal rectangles</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">step = (interval_end - interval_begin) / N;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Approx. 1/2 area in first rectangle: f(x0) * [step/2]</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum = INTEG_FUNC(interval_begin) * step / 2.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Apply midpoint rule:</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Given length = f(x), compute the area of the</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// rectangle of width step</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Sum areas of internal rectangle: f(xi + step) * step</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">for</span> (i=1;i&lt;N;i++)</span></p>
<span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">x_i = i * step;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum += INTEG_FUNC(x_i) * step;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">}</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Approx. 1/2 area in last rectangle: f(xN) * [step/2]</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum += INTEG_FUNC(interval_end) * step / 2.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes; mso-ansi-language: IT" lang="IT">printf( <span style="COLOR: maroon">" %10d | %14e | "</span>, N, sum);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">}</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">finish = clock();</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">duration = (finish - start);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt;  FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Application Clocks = %10e "</span>, duration);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="mso-no-proof: yes"><span style="font-size: small; font-family: Times New Roman;">}</span></span></p>
</blockquote>
</td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td><img src="http://software.intel.com/file/6324" height="5" width="388" /></td>
</tr>
<tr>
<td height="10"></td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold</link>
      <pubDate>Mon, 13 Jul 2009 15:32:16 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>OpenMP* Loops with Function Calls for Bounds May Not Parallelize</title>
      <description><![CDATA[ <br />
<div id="art_pre_template"><strong>Reference Number :</strong>  DPD200110877<br /><br /><br /><strong>Version :</strong> 11.0, 11.1 or Intel® Parallel Composer<br /><br /><br /><strong>Operating System : </strong>Windows*, Linux*, Mac OS X*<br /><br /><br /><strong>Problem Description : </strong>The OpenMP* 3.0 standard now supports using STL iterators for OpenMP loop bounds.  However, the Intel® C++ Compiler does not parallelize code like the following:<br /><br />
<pre name="code" class="cpp">#include &lt;vector&gt;

void iterator_example()
{
  std::vector&lt;double&gt; vec(23);
  std::vector&lt;double&gt;::iterator it;

#pragma omp parallel for default(none) shared(vec) 
  for (it = vec.begin(); it &lt; vec.end(); it++)
  {
    *it = 1.0;// do work with *it //
  }
}</pre>
<br /><br />The compiler will not give an indication (as it should) that the loop was parallelized for OpenMP*.  If you examine the code, you will see that the compiler generates a serial version of the loop.  This is because of an issue with the compiler using function calls on loop bounds that are inlined causing the compiler to not recognize the loop as being a validly formed loop for parallelization.<br /><br /><br /><strong>Resolution Status : </strong>This will be resolved in an upcoming compiler update.<br /><br /><br /><br /><em>[DISCLAIMER: The information on this web site is intended for hardware system manufacturers and software developers. Intel does not warrant the accuracy, completeness or utility of any information on this site. Intel may make changes to the information or the site at any time without notice. Intel makes no commitment to update the information at this site. ALL INFORMATION PROVIDED ON THIS WEBSITE IS PROVIDED "as is" without any express, implied, or statutory warranty of any kind including but not limited to warranties of merchantability, non-infringement of intellectual property, or fitness for any particular purpose. Independent companies manufacture the third-party products that are mentioned on this site. Intel is not responsible for the quality or performance of third-party products and makes no representation or warranty regarding such products. The third-party supplier remains solely responsible for the design, manufacture, sale and functionality of its products. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others.]</em></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize</link>
      <pubDate>Thu, 12 Mar 2009 17:06:43 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>Disable movbe to Test Intel® Atom™ Processor Targeted Code on non-Intel® Atom™ Processor Platforms</title>
      <description><![CDATA[ <p>The Intel® Compilers 11.0 allow you to target the Intel® Atom™ processor using the /QxSSE3_ATOM or -xSSE3_ATOM compiler options.  These options enable the generation of the movbe instruction which is unique to the Intel® Atom™ processor.  However, there is sometimes a need to run such codes on a different processor such as the Intel® Pentium® III processor (for example, for validation purposes where an Intel® Atom™ processor isn't available).  In these situations, the compiler provides the /Qinstruction:nomovbe (for Windows*) and -minstruction=nomovbe (for Linux*/Mac*) options to disable the generation of this instruction.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms</link>
      <pubDate>Fri, 20 Feb 2009 16:41:09 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
    <item>
      <title>Intel® C++ Compiler - Consistent Use of Compiler Options in Compile/Link Phase</title>
      <description><![CDATA[ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table border="0" cellspacing="15" cellpadding="0"><tr><td class="bodycopy">
<p>If you are compiling applications with a separate compile and link process, the optimization options in the compile/link phase should match, especially when&nbsp;using openmp, parallelization, vectorization or interprocedural optimizations. If options are not consistent, you may get missing symbols at link time, causing the link to fail.</p>
<strong>Example 1:</strong><br>In this example -xW is used in the compile phase, but it is missing from the link phase and results in an unresolved external symbol vmldExp2.&nbsp; Correct this problem by linking with -xW option.
<blockquote>icpc -xW -c test1.cpp test2.cpp
<br>icpc test1.o test2.o
<br><br><br>
</blockquote>
<strong>Example 2:</strong><br>In this example, -openmp is used in the compile phase but isn't used&nbsp;when linking. This results in unresolved errors when linking.&nbsp; Correct this problem by linking with -openmp option.
<blockquote>icpc -openmp test1.cpp test2.cpp
<br>icpc&nbsp;test1.obj test2.obj
<br>
</blockquote>
</td></tr></table>
<table border="0" cellspacing="0" cellpadding="0">
<tr><td><img src="http://software.intel.com/file/6324" width="388" height="5"></td></tr>
<tr><td height="10"></td></tr>
</table>
</body></html>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-c-compiler-consistent-use-of-compiler-options-in-compilelink-phase</link>
      <pubDate>Fri, 19 Sep 2008 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-c-compiler-consistent-use-of-compiler-options-in-compilelink-phase#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-c-compiler-consistent-use-of-compiler-options-in-compilelink-phase</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
    </item>
    <item>
      <title>Intel® Fortran Compiler - Training courses</title>
      <description><![CDATA[ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table border="0" cellspacing="15" cellpadding="0"><tr><td class="bodycopy">
<p>Intel offers training courses designed to help software developers become productive and to improve application performance with the Intel&reg; C++ and Intel&reg; Fortran Compilers for Windows*, Linux*, and Mac* OS environments. Focus is given to software optimization on a specific processor architecture.</p>
<p>For course and registration information, visit the 
<a href="http://www.intel.com/software/college/">Intel&reg; Software College</a>.</p>
</td></tr></table>
<table border="0" cellspacing="0" cellpadding="0">
<tr><td><img src="http://software.intel.com/file/6324" width="388" height="5"></td></tr>
<tr><td height="10"></td></tr>
</table>
</body></html>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-fortran-compiler-training-courses</link>
      <pubDate>Fri, 19 Sep 2008 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-fortran-compiler-training-courses#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-fortran-compiler-training-courses</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
    <item>
      <title>Performance Tools for Software Developers - How to generate optimal code for Intel® processors running Mac OS* X?</title>
      <description><![CDATA[ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table border="0" cellspacing="15" cellpadding="0"><tr><td class="bodycopy">
<p>The Intel® Compilers 10.0 for Mac OS* X running on systems with IA-32 architecture will include the option -xP at default optimization to automatically vectorize code and generate SSE3, SSE2, and SSE instructions for Intel processors and it can optimize for processors based on Intel® Core&trade; microarchitecture like Intel® Core&trade; Duo processors.</p>
<p>The Intel® Compilers 10.0 for Mac OS* X running on systems using Intel® 64 architecture will include the option -xT at default optimization to automatically vectorize code and generate SSSE3, SSE3, SSE2, and SSE instructions for Intel processors, and it can optimize for the Intel® Core&trade;2 Duo processor family.</p>
<p>In addition, -ipo (inter-procedural optimization), -prof_use (profile-guided optimization or PGO) and -O3 (high-level loop/memory optimizations) can add additional performance for many types of applications.</p>
<p><strong>Operating System:</strong><br></p>
<table border="0" cellspacing="0" cellpadding="0"><tr><td class="xs">Mac OS*</td></tr></table>
</td></tr></table>
<table border="0" cellspacing="0" cellpadding="0">
<tr><td><img src="http://software.intel.com/file/6324" width="388" height="5"></td></tr>
<tr><td height="10"></td></tr>
</table>
</body></html>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-how-to-generate-optimal-code-for-intel-processors-running-mac-os-x</link>
      <pubDate>Fri, 19 Sep 2008 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-how-to-generate-optimal-code-for-intel-processors-running-mac-os-x#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-how-to-generate-optimal-code-for-intel-processors-running-mac-os-x</guid>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Mac OS X* Knowledge Base</category>
    </item>
    <item>
      <title>Performance Tools for Software Developers - Some Applications Built with -xP or /QxP Optimizations May Produce Runtime Error</title>
      <description><![CDATA[ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table border="0" cellspacing="15" cellpadding="0"><tr><td class="bodycopy">
<p><strong>Symptom(s):</strong><br></p>
<p>The following message may be displayed when a program built with the switches "-xP" (on Linux*) or "/QxP" (on Windows*) is run on a system with an Intel® Core&trade; 2 Duo processor or an Intel® Xeon&reg; processor 5100 series.</p>
<blockquote><strong>"Fatal Error: This program was not built to run on the processor in your system."</strong></blockquote>
<p>In addition, the program may not take the optimal execution path when it is built with the switches "-axP" (on Linux) or "/QaxP" (on Windows).</p>
<p><strong>Solution:</strong><br></p>
<p>This problem has been seen in cases where the application has been compiled with one of the following Intel® Compilers:</p>
<ul>
<li><strong>Intel Compilers for Intel® 64-based Applications</strong></li>
<li>Intel® C++ Compiler 9.0 for Linux* with Package ID l_cc_c_9.0.027 or older</li>
<li>Intel® C++ Compiler 8.1 for Linux* with Package ID l_cce_pc_8.1.032 or older</li>
<li>Intel® Fortran Compiler 9.0 for Linux* with Package ID l_fc_pc_9.0.028 or older</li>
<li>Intel® Fortran Compiler 8.1 for Linux* with Package ID l_fce_pc_8.1.034 or older</li>
<li>Intel® C++ Compiler 9.0 for Windows* with Package ID W_CC_C_9.0.025 or older</li>
<li>Intel® C++ Compiler 8.1 for Windows* with Package ID W_CCE_PC_8.1.026 or older</li>
<li>Intel® Fortran Compiler 9.0 for Windows* with Package ID W_FC_C_9.0.025 or older</li>
<li>Intel® Fortran Compiler 8.1 for Windows* with Package ID W_FCE_PC_8.1.023 or older</li>
</ul>
<ul><li><strong>Intel Compilers for IA32-based Applications</strong></li></ul>
<ul>
<li>Intel® C++ Compiler 8.1 for Linux* with Package ID l_cc_c_8.1.028 or older</li>
<li>Intel® Fortran Compiler 8.1 for Linux* with Package ID l_fc_c_8.1.024 or older</li>
</ul>
<ul>
<li>Intel® C++ Compiler 8.1 for Windows* with Package ID W_CC_PC_8.1.022 or older</li>
<li>Intel® Fortran Compiler 8.1 for Windows* with Package ID W_FC_PC_8.1.033 or older</li>
</ul>
<p>This is due to problems with runtime checks that the compiler generates to determine the type of processor on which the application is being run to ascertain what instruction set (such as SSE, SSE2, SSE3, and new instructions in Intel® Core&trade; 2 Duo processors) can be utilized.</p>
<p>To resolve the problem recompile the application with the newer Intel Compilers listed below:</p>
<ul><li><strong>Intel Compilers for Intel® 64-based Applications</strong></li></ul>
<ul>
<li>Intel C++ Compiler 9.1 for Linux with Package ID l_cc_p_9.1.038 or higher</li>
<li>Intel C++ Compiler 9.0, for Linux with Package ID l_cc_c_9.0.030 or higher</li>
<li>Intel C++ Compiler 8.1, for Linux with Package ID l_cce_pc_8.1.036 or higher</li>
</ul>
<ul>
<li>Intel Fortran Compiler 9.1, for Linux with Package ID l_fc_p_9.1.032 or higher</li>
<li>Intel Fortran Compiler 9.0, for Linux with Package ID l_fc_c_9.0.031 or higher</li>
<li>In
tel Fortran Compiler 8.1, for Linux with Package ID l_fce_pc_8.1.036 or higher</li>
</ul>
<ul>
<li>Intel C++ Compiler 9.1, for Windows with Package ID W_CC_P_9.1.022 or higher</li>
<li>Intel C++ Compiler 9.0, for Windows with Package ID W_CC_C_9.0.028 or higher</li>
<li>Intel C++ Compiler 8.1, for Windows with Package ID W_CCE_PC_8.1.028 or higher</li>
</ul>
<ul>
<li>Intel Fortran Compiler 9.1, for Windows with Package ID W_FC_C_9.1.024 or higher</li>
<li>Intel Fortran Compiler 9.0, for Windows with Package ID W_FC_C_9.0.028 or higher</li>
<li>Intel Fortran Compiler 8.1, for Windows with Package ID W_FCE_PC_8.1.025 or higher</li>
</ul>
<ul><li><strong>Intel Compilers for IA32-based Applications</strong></li></ul>
<ul>
<li>Intel C++ Compiler 9.1 for Linux with Package ID l_cc_p_9.1.038 or higher</li>
<li>Intel C++ Compiler 9.0 for Linux with Package ID l_cc_c_9.0.032 or higher</li>
<li>Intel C++ Compiler 8.1 for Linux with Package ID l_cc_pc_8.1.037 or higher</li>
</ul>
<ul>
<li>Intel Fortran Compiler 9.1 for Linux with Package ID l_fc_c_9.1.032 or higher</li>
<li>Intel Fortran Compiler 9.0 for Linux with Package ID l_fc_c_9.0.033 or higher</li>
<li>Intel Fortran Compiler 8.1 for Linux with Package ID l_fc_pc_8.1.033 or higher</li>
</ul>
<ul>
<li>Intel C++ Compiler 9.1 for Windows with Package ID W_CC_P_9.1.022 or higher</li>
<li>Intel C++ Compiler 9.0 for Windows with Package ID W_CC_C_9.0.030 or higher</li>
<li>Intel C++ Compiler 8.1 for Windows with Package ID W_CC_PC_8.1.036 or higher</li>
</ul>
<ul>
<li>Intel Fortran Compiler 9.1 for Windows with Package ID W_FC_C_9.1.024 or higher</li>
<li>Intel Fortran Compiler 9.0 for Windows with Package ID W_FC_C_9.0.030 or higher</li>
<li>Intel Fortran Compiler 8.1 for Windows with Package ID W_FC_PC_8.1.040 or higher</li>
</ul>
</td></tr></table>
<table border="0" cellspacing="0" cellpadding="0">
<tr><td><img src="http://software.intel.com/file/6324" width="388" height="5"></td></tr>
<tr><td height="10"></td></tr>
</table>
</body></html>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-some-applications-built-with-xp-or-qxp-optimizations-may-produce-runtime-error</link>
      <pubDate>Fri, 19 Sep 2008 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-some-applications-built-with-xp-or-qxp-optimizations-may-produce-runtime-error#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-some-applications-built-with-xp-or-qxp-optimizations-may-produce-runtime-error</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
  </channel></rss>