<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Tue, 24 Nov 2009 18:54:05 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-kb/type/performance-and-optimization/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles feed</title>
    <link>http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-kb/performance-and-optimization/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Compile moblin 2.1 kernel sources with Intel Compiler</title>
      <description><![CDATA[ <p>1.  Prepare for build</p>
<p>       a.  su into root.</p>
<p>       b.  Create a ~/.rpmmacros file containing:</p>
<p>       %_topdir %(echo $HOME)/rpmbuild</p>
<p>      %_smp_mflags -j3</p>
<p>      %__arch_install_post /usr/lib/rpm/check-rpaths /usr/lib/rpm/check-buildroot</p>
<p>      %_default_patch_fuzz 2</p>
<p>      #end of file</p>
<p>      fuzz=2 is critical to allow the 915 patches to merge correctly.</p>
<p>_smp_mflags -jN, N should be set to the number of processor cores on your build system</p>
<p>2.  Setup and do initial build.  Download the kernel source from a source repository:</p>
<p>(Used http://repo.moblin.org/moblin/releases/2.1/source/kernel-2.6.31.5-10.1.moblin2.src.rpm)</p>
<p>       a.  Extract the rpm:</p>
<p>       $ rpm -ivh kernel-2.6.31.5-10.1.moblin2.src.rpm</p>
<p>       b. Prepare and do initial build to rpm:</p>
<p>       $ cd rpmbuild/SPECS</p>
<p>       c. Remove lines 762 &amp; 763, 'BuildKernel %make_target %kernel_image menlow' &amp; 'BuildKernel %make_target %kernel_image ivi'</p>
<p>       $ rpmbuild -bc --target=i586 kernel.spec</p>
<p>       The previous step ensures you have the components to build successfully.</p>
<p>3.  Rebuild with gcc:</p>
<p>       $ cd rpmbuild/BUILD/kernel-2.6.31/linux-2.6.31</p>
<p>       $ make clean</p>
<p>       $ make bzImage</p>
<p>       $ make modules</p>
<p>       $ make modules_install</p>
<p>       $ make install </p>
<p>      Test the image on your netbook by rebooting the system and selecting the kernel at the grub screen.</p>
<p>Rebuild with Intel Compiler</p>
<p>1.  Perform step 1 in the previous section</p>
<p>2.  Setup Intel Compiler environment:</p>
<p>       a.  Set the compiler environment:</p>
<p>       $ source /opt/intel/Compiler/11.1/XXX/bin/ia32/iccvars_ia32.sh</p>
<p>       where XXX is the particular version you are using.</p>
<p>3.  Make source code modifications.</p>
<p>      a.  Modify include/linux/compiler-intel.h and add the following line at</p>
<p>the end of the file:</p>
<p>       #undef __compiler_offsetof</p>
<p>       b.  Modify arch/x86/include/asm/xor_32.h at line 843, change</p>
<p>       : "+r" (lines),</p>
<p>       To:</p>
<p>       : "+rm" (lines),</p>
<p>       This change is required because the code is written to work with gcc's assumption that the stack is 16 byte aligned (contrary to the ABI) and thus there is one additional register available.</p>
<p>c.  Add libirc_s.a to the link command by modifying Makefile at line 699.  Change:</p>
<p>       --start-group $(vmlinux-main) --end-group</p>
<p>       to:</p>
<p>       --start-group $(vmlinux-main) /opt/intel/Compiler/11.1/XXX/lib/ia32/libirc_s.a</p>
<p>--end-group</p>
<p>       d.  Modify arch/x86/include/asm/delay.h and remove the references to __bad_udelay and replace the calls to it with 0.</p>
<p>       e.  Modify include/linux/log2.h and remove the references to ____ilog2_NaN() and replace the calls to it with 0.</p>
<p>       f.  Modify line 47 of arch/x86/kernel/acpi/realmode/Makefile adding a reference to libirc_s.a as follows:<br />        WAKEUP_OBJS = $(addprefix $(obj)/,$(wakeup-y)) /opt/intel/Compiler/11.1/053/lib/ia32/libirc_s.a</p>
<p>        g.  Modify line 89 of arch/x86/boot/Makefile adding a reference to libirc_s.a as follows:<br />SETUP_OBJS = $(addprefix $(obj)/,$(setup-y)) /opt/intel/Compiler/11.1/053/lib/ia32/libirc_s.a</p>
<p>        h. Modify kernel/trace/trace_events.c by adding the following line of code (line 915) at the start of event_create_dir():<br />        if (call-&gt;system==NULL) return -1;<br />This change is due to a kernel issue where the source code makes alignment assumptions that are not enforced in the kernel source code.  This change is merely a workaround, not a fix for the real issue.</p>
<p>        i.  Modify the intelwrapper file documented at <a href="http://www.linuxdna.com/">www.linuxdna.com</a> and make the following changes:<br />          - Enable script to replace -march=i686 | -mtune=pentium3 | -mtune=generic with ‘'<br />          - Enable script to replace -O2 | -Os with ‘-O3 -ip -xSSE3_ATOM'<br />            Performance BKM: The goal of this option change is to enable higher performance.  Note that the default kernel options uses things such as -mnosse and -fno-omit-frame-pointer which can override the higher optimizations in some places.<br />          - Enable script to call gcc to compile drivers/net/wimax/i2400m/fw.c | lib/libcrc32c* | crypto/testmgr* - this is required because 11.1 does not yet support some instances of variable length arrays<br />          - Enable script to call gcc to compile arch/x86/boot/compressed/misc.c - issue with PIC in icc<br />          - Enable script to call gcc to compile drivers/block/spectra/ffsport.c - inline asm issue</p>
<p>4.  Start the build:</p>
<p>       $ cd &lt;build path&gt;/rpmbuild/BUILD/kernel-2.6.31/linux-2.6.31</p>
<p>       $ make clean</p>
<p>       $ make HOSTCC=intelwrapper CC=intelwrapper CONFIG_DEBUG_SECTION_MISMATCH=y KBUILD_MODPOST_WARN=n bzImage</p>
<p>       $ make HOSTCC=intelwrapper CC=intelwrapper CONFIG_DEBUG_SECTION_MISMATCH=y KBUILD_MODPOST_WARN=n modules</p>
<p>       $ make HOSTCC=intelwrapper CC=intelwrapper CONFIG_DEBUG_SECTION_MISMATCH=y KBUILD_MODPOST_WARN=n modules_install</p>
<p>       $ make HOSTCC=intelwrapper CC=intelwrapper CONFIG_DEBUG_SECTION_MISMATCH=y KBUILD_MODPOST_WARN=n install</p>
<p>The make install step will install the kernel and modify grub so that the kernel can be selected upon bootup.</p>
To build with other configurations, such as the mrst config:<br />cp configs/kernel-mrst.config .config and go to step 4.<br />
<div id="art_pre_template"><strong></strong></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/compile-moblin-21-kernel-sources-with-intel-compiler</link>
      <pubDate>Wed, 18 Nov 2009 09:36:37 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/compile-moblin-21-kernel-sources-with-intel-compiler#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/compile-moblin-21-kernel-sources-with-intel-compiler</guid>
      <category>Intel® Atom™ Software Developer Community</category>
      <category>Intel C++ Tool Suite for MIDs</category>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
    </item>
    <item>
      <title>Putting -lm Before User Objects/Libraries on Link Line Can Impact Performance</title>
      <description><![CDATA[ <br />
<div id="art_pre_template"><strong>Reference Number : </strong>DPD200121218<br /><br /><br /><strong>Version :</strong> 11.0<br /><br /><br /><strong>Operating System :</strong> Linux*<br /><br /><br /><strong>Problem Description : </strong>Starting with 11.0.081, a fix was made to the compiler driver to link the Intel Math Library libimf statically by default (the intended and documented behavior) instead of dynamically.  If -lm is used with the compiler driver (icc/icpc/ifort), the driver automatically inserts libimf before libm in the link line.  Linking in libimf.a when this is done causes a problem if -lm precedes any user-created objects or libraries.  For example:<br /><br />
<pre name="code" class="plain:nogutter:nocontrols">icc -lm user1.o user2.o -luser3</pre>
Gets converted by the driver to:<br /><br />
<pre name="code" class="plain:nogutter:nocontrols">ld ... -Bstatic -limf -Bdynamic -lm ... user1.o user2.o -luser3 ...</pre>
On Linux, static libraries must come after the object/library files that use them in the link line in order for the symbols to resolve.  Since libimf.a comes before the objects/libraries that use standard math functions, these math functions won't resolve to the static Intel math library.  However, the dynamic libm doesn't have an order dependency because it is a dynamic library, so the math functions will resolve to the GNU math library.  This can have a significant performance impact.<br /><br /><br /><strong>Resolution Status : </strong>Starting with 11.1.056, the compiler now emits a warning if -lm precedes other user objects or libraries on the linker command line.  If you run into a performance regression between compilers prior to 11.0.081 and compilers from 11.0.081 on, please verify that you don't have -lm being used prior to your objects/libraries in your link lines if you use icc/icpc/ifort to link. Using the above example, make the following change:<br /><br />
<pre name="code" class="plain:nogutter:nocontrols">icc user1.o user2.o -luser3 -lm</pre>
<em>[DISCLAIMER: The information on this web site is intended for hardware system manufacturers and software developers. Intel does not warrant the accuracy, completeness or utility of any information on this site. Intel may make changes to the information or the site at any time without notice. Intel makes no commitment to update the information at this site. ALL INFORMATION PROVIDED ON THIS WEBSITE IS PROVIDED "as is" without any express, implied, or statutory warranty of any kind including but not limited to warranties of merchantability, non-infringement of intellectual property, or fitness for any particular purpose. Independent companies manufacture the third-party products that are mentioned on this site. Intel is not responsible for the quality or performance of third-party products and makes no representation or warranty regarding such products. The third-party supplier remains solely responsible for the design, manufacture, sale and functionality of its products. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others.]</em></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/putting-lm-before-user-objectslibraries-on-link-line-can-impact-performance</link>
      <pubDate>Wed, 14 Oct 2009 11:45:33 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/putting-lm-before-user-objectslibraries-on-link-line-can-impact-performance#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/putting-lm-before-user-objectslibraries-on-link-line-can-impact-performance</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
    </item>
    <item>
      <title>Building the GAMESS with Intel® Compilers, Intel® MKL and Intel® MPI on Linux</title>
      <description><![CDATA[ <br /><b>Introduction :</b><br />This document explains how to build GAMESS using the Intel Software products:<br />Intel® C++ Compiler for Linux,<br />Intel® Fortran Compiler for LINUX,<br />Intel® MKL,<br />Intel® MPI for Linux.<br /><br /><br /><b>Version :</b><br />GAMESS version January 12, 2009 R3 for 64 bit IA64/x86_64.<br /><br /><b>Obtaining Source Code :</b><br />The GAMESS sources can be downloaded <a href="http://www.msg.chem.iastate.edu/gamess/download.html">here</a>.<br /><br /><b>Prerequisites :</b><br />Should be installed Intel® Compilers with Intel® MKL and Intel® MPI for Linux.<br /><br /><b>Environment Set Up :</b><br />Environment variables for the Intel(R) C++ Compiler Professional Edition for Linux, Intel(R) Fortran  Compiler Professional Edition for Linux and Intel(R) MPI should be set.<br />E.g.<br />$export INTEL_COMPILER_TOPDIR="/opt/intel/Compiler/11.1/046"<br />$. $INTEL_COMPILER_TOPDIR/bin/intel64/ifortvars_intel64.sh<br />$. $INTEL_COMPILER_TOPDIR/bin/intel64/iccvars_intel64.sh<br />$. /opt/intel/impi/3.2.1.009/bin64/mpivars.sh<br /><br /><b>Building the Application :</b><br />1)Copy/move tar file gamess-current.tar.gz to the directory /opt.<br /><br />2)Decompress source files<br />$ tar -zxvf gamess-current.tar.gz<br />.<br />3)<br />$cd ./gamess<br />.<br />4) Creating actvte.x file:<br />$cd ./tools<br />$cp actvte.code actvte.f<br />Replace all "*UNX" by " "(4 spaces with out " ") in the file actvte.f. Can be used any text editor.<br />$ifort -o actvte.x actvte.f<br />$rm actvte.f<br />$cd ..<br />.<br />5)Building the Distributed Data Interface(DDI) with Intel(R) MPI:<br />$cd ./ddi<br />a) Editing file ./compddi.<br />Set machine type (approximately line 18):<br />set TARGET=linux-ia64<br />Set MPI communication layer (approximately line 48):<br />set COMM = mpi<br />Set include directory for Intel® MPI (approximately line 105):<br />set MPI_INCLUDE_PATH = '-I/net/spdr62/opt/spdtools/impi/intel64/3.2.011/include64'<br />b)Build DDI with Intel(R) MPI<br />$ ./compddi &gt;&amp;compddi.log<br />c) If building completed successfully then library libddi.a will appear. Otherwise check compddi.log for errors.<br />d) $cd ..<br />.<br />6) Compiling the GAMESS:<br />a) Editing file ./comp<br />Set machine type (approximately line 15):<br />set TARGET=linux-ia64<br />Set the GAMESS root directory (approximately line 16):<br />chdir /opt/gamess<br />Uncomment line 1461:<br />setenv MKL_SERIAL YES<br />.<br />b) Editing file ./compall:<br />Set machine type (approximately line 16):<br />set TARGET=linux-ia64<br />Set the GAMESS root directory (approximately line 17):<br />chdir /opt/gamess<br />Set to use Intel® C++ Compiler (approximately line 70):<br />if  ($TARGET == linux-ia64) set CCOMP='icc'<br />c) Compiling the GAMESS:<br />$compall &gt;&amp;compall.log<br />Check file compall.log for errors.<br />d)<br />$cd ..<br />.<br />7)Liniking the GAMESS with Intel® Software products:<br />a) Editing file ./lked<br />Set machine type (approximately line 18):<br />set TARGET=linux-ia64<br />Set the GAMESS root directory (approximately line 19):<br />chdir /opt/games<br />Set MKL environment (approximately line 509):<br />setenv MKLPATH /net/spdr62/opt/spdtools/compiler/cpro/Compiler/11.1/046/mkl/lib/em64t<br />setenv MKL_SERIAL YES<br />set mklver=10<br />Set the message passing libraries in a single line (approximately line 714):<br />set MSG_LIBRARIES='../ddi/libddi.a -L/net/spdr62/opt/spdtools/impi/intel64/3.2.011/lib64 -lmpi -lmpigf -lmpigi -lrt -lpthread'<br />b) Link the GAMESS<br />$./lked &gt;&amp;lked.log<br />If linking completed successfully then executable file gamess.00.x will appear. Otherwise check lked.log for errors.<br /><br /><b>Running the Application :</b><br />This section below describes how to execute GAMESS with Intel® MPI. For further information check file ./ddi/readme.ddi.<br />For the testing GAMESS will be used script rungms as the base.<br />1)<br />Only few amends are needed.<br />Set the target for execution to mpi (line 59):<br />set TARGET=mpi<br />.<br />Set a directory SCR where large temporary files can reside(line 60):<br />E.g.<br />set SCR=/opt/gamess/tests<br />.<br />Correct the setting environment variables ERICFMT and MCPPATH (lines 127and 128):<br />E.g.<br />setenv ERICFMT /opt/gamess/ericfmt.dat<br />setenv MCPPATH /opt/gamess/mcpdata<br />.<br />Replace all “~$USER” by “/opt/gamess/tests”. Or by other directory.<br />NOTE: Directory /opt/gamess/tests/scr should exist. If no then create it.<br />Replace all “/home/mike/gamess” by “/opt/gamess”.<br />Correct the setting environment variables for Intel® MKL and MPI (lines 948 and 953):<br />E.g.<br />setenv LD_LIBRARY_PATH /opt/Compiler/11.1/046/mkl/lib/em64t<br />setenv LD_LIBRARY_PATH /opt/intel/impi/intel64/3.2.011/lib64:$LD_LIBRARY_PATH<br />.<br />Correct setting environment variables to execution Intel® MPI path (line 954):<br />E.g. set path=(/opt/intel/impi/intel64/3.2.011/bin64 $path)<br />.<br />Done.<br />2)<br />Now choose the testcase from directory ./tests and run GAMESS.<br />E.g.<br />$./rungms exam08<br />The output data will be stored in the directory /opt/gamess/scr.<br /><br />3) To execute GAMESS on 2 or more processes on 1 node:<br />$ ./rungms exam08 00 2<br /> ]]></description>
      <link>http://software.intel.com/en-us/articles/building-gamess-with-intel-compilers-intel-mkl-and-intel-mpi-on-linux</link>
      <pubDate>Wed, 26 Aug 2009 07:02:09 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/building-gamess-with-intel-compilers-intel-mkl-and-intel-mpi-on-linux#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/building-gamess-with-intel-compilers-intel-mkl-and-intel-mpi-on-linux</guid>
      <category>Intel Software Network communities</category>
      <category>Intel® MKL</category>
      <category>Intel® MPI Library</category>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® MPI Library for Linux* Knowledge Base</category>
    </item>
    <item>
      <title>WPS V3.1.1 installation best known method for Linux with Intel® Fortran Compiler v. 11.1</title>
      <description><![CDATA[ <br /><b>Introduction :</b><br />This document explains how to build the WRF Preprocessing System(WPS) v3.1.1 using the Intel® Fortran Compiler for Linux, for example, version 11.1.046.<br /><br /><b>Version </b>: v3.1.1<br /><br /><b>Obtaining Source Code :</b><br />The source codes can be downloaded <a href="http://www.mmm.ucar.edu/wrf/users/download/get_sources.html">here</a>. The input data for geogrid.exe can be downloaded <a href="http://www.mmm.ucar.edu/wrf/src/wps_files/">here</a>. The input data for ungrid.exe you can download <a href="http://www.mmm.ucar.edu/wrf_tmp/WRF_OnLineTutorial/SOURCE_DATA/JAN00.tar.gz">here</a>.<br /><br /><b>Prerequisites :</b><br />You should have installed:<br />1)The Weather Research &amp; Forecasting(WRF) v3. WRF V3.1.1 installation BKM for Linux with Intel C++ and Fortran COMPILER v. 11.1 you can find <a href="http://software.intel.com/en-us/articles/wrf-installation-bkm-for-linux-with-intel-c-and-fortran-compiler-v-111/">here</a>.<br />2)The Jasper library. Installation best know method with the Intel® Compilers you can find <a href="http://software.intel.com/en-us/articles/jasper-installation-bkm/">here</a>.<br />3)The JPEG library. Installation best know method with the Intel® Compilers you can find <a href="http://software.intel.com/en-us/articles/jpeg-7-installation-bkm/">here</a>.<br />4)The Zlib  library. Installation best know method with the Intel® Compilers you can find <a href="http://software.intel.com/en-us/articles/zlib-library-installation-bkm-with-intelr-c-compiler-111046-for-linux/">here</a>.<br />5)The NCAR Graphics* library. How to build with Intel(R) Compilers you can find <a href="http://software.intel.com/en-us/articles/performance-tools-for-software-developers-building-ncar-graphics-with-the-intel-compilers/">here</a>.<br /><br /><b>Environment Set Up :</b><br />You should set up environment variables for Intel(r) Fortran Compiler and netCDF. If you want to build distributed version of WPS then you should set up environment variables for Inte MPI.<br />E.g.<br />$export INTEL_COMPILER_TOPDIR="/opt/spdtools/compiler/cpro/Compiler/11.1/046"<br />$. $INTEL_COMPILER_TOPDIR/bin/intel64/ifortvars_intel64.sh<br />$export NETCDF=/opt/netcdf<br />$. /opt/intel/impi/3.2.1.009/bin64/mpivars.sh<br /><br /><br /><b>Source Code Changes : </b>none<br /><br /><b>Building the Application :</b><br />1)Copy/move tar file WPSV3.1.1.TAR.gz to the directory /opt.<br />2)Decompress source files<br />$tar -zxvf WPSV3.1.1.TAR.gz<br />.<br />3)Set up environment variables which are mentioned in section "Configuration Set Up"<span lang="EN-US" style="font-size: 12pt; font-family: &quot;Times New Roman&quot;;"></span>.<br />4)Configure WPS<br />$./configure<br />NOTE from WPS README:<br />"<br />If the user is on a recognized architecture, the<br />configure script will display a list of available<br />compile options (usually serial vs parallel, Grib 2<br />enabled vs a "NO GRIB2" option).  For some OS options,<br />there are multiple compilers that are supported.<br />".<br />Before compile you should check variables<br />NCARG_ROOT<br />and <br />WRF_DIR<br />in the post-configure file “configure.wps”.<br />5)Compile WPS<br />$./compile<br />.<br /><b>Running the Application :<br /></b>1)<br />To test geogrid.exe in serial mode you should download workloads <a href="http://www.mmm.ucar.edu/wrf/src/wps_files/geog_v3.1.tar.gz">geog_v3.1.tar.gz </a>(e.g. /opt/WPS/WPS_DATA). Then to decompress this tar file<br />$tar -zxvf geog_v3.1.tar.gz<br />. <br />Edit  namelist.wps file (which should be in the root WPS directory /opt/WPS):<br />In the section &amp;geogrid you should change parameter  geog_data_path to path where geog_v3.1.tar.gz was decompressed(e.g. geog_data_path=/opt/WPS/WPS_DATA/geog_v3.1/geog). <br />Execute $ ./geogrid.exe. <br />When geogrid.exe has finished running, the message: <br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br />! Successful completion of geogrid. !<br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br /><br />2) <br />To test ungrid.exe in serial mode you should download workloads <a href="http://www.mmm.ucar.edu/wrf_tmp/WRF_OnLineTutorial/SOURCE_DATA/JAN00.tar.gz">JAN00.tar.gz</a>. Then decompress it(/opt/WPS/WPS_DATA):<br />$tar -zxvf JAN00.tar.gz<br />.<br />Run g1print utility:<br />$./util/g1print.exe ../WPS_DATA/JAN00/2000012412.AWIPSF<br />.<br />Link in the AWIP Vtable:<br />$ln -sf ungrib/Variable_Tables/Vtable.AWIP<b></b> Vtabl<br />.<br />Link in the GRIB data by making use of the script link_grib.csh:<br />$./link_grib.csh ../WPS_DATA/JAN00/2000012<br />.<br />Edit namelist.wps, and set the following:<br />start_date = '2000-01-24_12:00:00',<br />end_date = '2000-01-25_12:00:00',<br />interval_seconds = 21600,<br /><b></b>Run ungrib to create the intermediate files:<br />$./ungrib.exe<br />After finishing you should get message:<br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br />! Successful completion of geogrid.!<br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br /><br />3)<br />To test metgrid.exe you should run it:<br />$./metgrid.exe<br />After finishing you should get message:<br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br />! Successful completion of geogrid.!<br />!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<br /><br /><b>Known Issues or Limitations :</b><br />1)<br />If WPS should be built by Intel MPI then change variables FC and CC by <br />CC=mpiicc<br />FC=mpiifort<br />in the post-configure file “configure.wps”.<br />2)<br />In file namelist.wps variable max_dom =2 (by default). But for provided workload it should be equal 1:<br />max_dom =1. <br /><br /><br /><b></b> ]]></description>
      <link>http://software.intel.com/en-us/articles/wps-v311-installation-bkm-for-linux-with-intel-fortran-compiler-v-111</link>
      <pubDate>Wed, 19 Aug 2009 05:33:56 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/wps-v311-installation-bkm-for-linux-with-intel-fortran-compiler-v-111#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/wps-v311-installation-bkm-for-linux-with-intel-fortran-compiler-v-111</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
      <category>Intel® MPI Library</category>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® MPI Library for Linux* Knowledge Base</category>
    </item>
    <item>
      <title>How to Compile for Intel® AVX</title>
      <description><![CDATA[ <div id="art_pre_template">Intel® AVX (Intel® Advanced Vector Extensions) is a 256 bit instruction set extension to Intel® SSE (Intel® Streaming SIMD Extensions) that was first announced in 2008. Further information about Intel AVX is available at <a href="http://software.intel.com/en-us/avx/">http://software.intel.com/en-us/avx/</a> .<br /><br />The Intel C/C++ and Fortran Compilers, version 11.1, support the building of applications for Intel AVX. On Windows*, use the command line switch /QxAVX. On Linux*, use –xavx. The switches /QaxAVX (Windows) and –axavx (Linux) may be used to build applications that will take advantage of AVX instructions on Intel systems that support these, but will use only SSE instructions on other systems.<br /><br />Both C/C++ and Fortran compilers support automatic vectorization of floating-point loops using AVX instructions. The C/C++ compiler also supports AVX-based intrinsics (via the header file immintrin.h) and inline assembly. Intel AVX allows the vectorization of a wider variety of floating point loops than Intel SSE, with a greater potential performance gain due to the greater width of the SIMD registers. The vectorizer is enabled automatically by the switches listed above. To see which loops have been vectorized, use the switch /Qvec-report1 (windows) or –vec-report1 (Linux).<br /><br />Pending availability of processors supporting Intel AVX, the Intel® Software Development Emulator (Intel® SDE) is available for testing programs built for Intel AVX. See <a href="http://software.intel.com/en-us/articles/intel-software-development-emulator/">http://software.intel.com/en-us/articles/intel-software-development-emulator/</a> .<br />Further general information about the Intel Compilers for C/C++ and Fortran is available at <a href="http://software.intel.com/en-us/intel-compilers/">http://software.intel.com/en-us/intel-compilers/</a> . Further information about compiler support for Intel AVX may be found in the Intel C++ Compiler User and Reference Guides, for example in the section 'Intrinsics for Advanced Vector Extensions', accessible online at <a href="http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/win/compiler_c/index.htm">http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/win/compiler_c/index.htm</a> .</div> ]]></description>
      <link>http://software.intel.com/en-us/articles/how-to-compile-for-intel-avx</link>
      <pubDate>Thu, 16 Jul 2009 16:34:04 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/how-to-compile-for-intel-avx#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/how-to-compile-for-intel-avx</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
    <item>
      <title>Performance Tools for Software Developers - Loop blocking</title>
      <description><![CDATA[ <p><b>Loop blocking</b> is a combination of strip mining and loop interchange to enhance reuse of local data. It helps the nested loops that manipulate arrays and are too large to fit into the cache. The loop blocking allows reuse of the arrays by transforming the loops such that the transformed loops manipulate array strips that fit into the cache. In effect, a blocked loop uses array elements in sections that are optimally sized to fit in the cache.</p>
<p> </p>
<p>Use cache <b>blocking</b> to arrange a <b>loop</b> so it will perform as many computations as possible on data already residing in cache. (The next <b>block</b> of data is not read into cache until computations using the first <b>block</b> are finished.)</p>
<p>The loop blocking optimization is part of HLO phase in Intel compiler and is available when using compiler option <span style="mso-bidi-font-family: 'Courier New'; mso-ansi-language: EN;" lang="EN">-O3</span>. The compiler uses default heuristics for loop blocking. But you may also use /Qopt-block-factor:n in Windows or -opt-block-factor:n in Linux to specify loop blocking factor.</p>
<p><b>Data reuse:</b></p>
<p>Data reuse is important to understand blocking. There are two types of data reuse associated with loop blocking:</p>
<ul>
<li>Spatial reuse </li>
<li>Temporal reuse</li>
</ul>
<p> </p>
<p><b>Spatial reuse</b></p>
<p>Spatial reuse uses data that was encached as a result of fetching another piece of data from memory. The data is fetched one cache lines at a time. This is 64 bytes for Intel(R) Core2 processors. If the requested data is located at the beginning of the cache line (aligned data), and the rest of the cache line contains subsequent array elements then for float array, this means the requested element and the seven following elements are cached on each fetch after the first. If any of these seven elements could then be used on any subsequent iterations of the loop, the loop would be exploiting spatial reuse. For loops with strides greater than one, spatial reuse can still occur. However, the cache lines contain fewer usable elements.</p>
<p><b>Temporal reuse</b></p>
<p>Temporal reuse uses the same data item in more than one iteration of the loop. If the loop uses the same element in subsequent loop iterations then loop exhibits temporal reuse in the context of the loop. The blocking exploits spatial reuse by ensuring that once fetched, cache lines are not overwritten until their spatial reuse is exhausted.</p>
<p><b>Example 1: Simple Loop Blocking</b></p>
<p>The following example demonstrates the simple loop blocking. The <b>loop blocking</b> allows arrays A and B to be <b>blocked</b> into smaller rectangular chunks so that the total combined size of two <b>blocked</b> (A and B) chunks is smaller than cache size, which can improve data reuse.</p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: 9.5pt; color: black; font-family: Arial;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// before_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" before_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 8000</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>a[i][j] = a[i][j] + b[j][i]; <span style="color: green;">//Adds two matrices</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"><span style="font-size: small;"><span style="font-family: Times New Roman;">The above code is modified below to enhance reuse of the cached data:</span></span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// after_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" after_loopblocking.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 8000</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> BS 16<span style="mso-spacerun: yes;"> </span><span style="color: green;">//Block size is selected as the loop-blocking factor. </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>add(A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> add(<span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, ii, jj;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i+=BS) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j+=BS)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(ii=i; ii&lt;i+BS; ii++)<span style="color: green;">//outer loop</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(jj=j; jj&lt;j+BS; jj++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{<span style="mso-spacerun: yes;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; text-indent: 0.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">//Array B experiences one cache miss</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 2in; text-indent: 0.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">//for every iteration of outer loop</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 5;"> </span>a[ii][jj] = a[ii][jj] + b[jj][ii];<span style="mso-tab-count: 5;"> </span><span style="mso-tab-count: 1;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 9.5pt; color: black; font-family: Arial; mso-ansi-language: EN;" lang="EN"> </span></p>
<p><b>Example 2: Complex Blocking</b></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// matrixMul.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" matrixMul.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 800</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> C[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>matmul(C, A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, k;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0; j&lt;MAX;j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(k=0; k &lt; MAX; k++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT">{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 5;"> </span>c[i][j] = c[i][j] + a[i][k] * b[k][j]; </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto;"><span style="font-size: small;"><span style="font-family: Times New Roman;"><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;">The above code is modified below to enhance </span><span style="color: black; mso-bidi-font-family: Arial; mso-ansi-language: EN; mso-bidi-font-size: 9.5pt;" lang="EN">spatial</span><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"> and </span><span style="color: black; mso-bidi-font-family: Arial; mso-ansi-language: EN; mso-bidi-font-size: 9.5pt;" lang="EN">temporal</span><span style="color: black; mso-bidi-font-family: Arial; mso-bidi-font-size: 9.5pt;"> reuse of the cached data for array a, b and c:</span></span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">// matrixMulBlk.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;">/*</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*<span style="mso-spacerun: yes;"> </span>icl /Qoption,link,"/STACK:1000000000" matrixMulBlk.cpp</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>*/</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#include</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> <span style="color: maroon;">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: maroon; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> MAX 800</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">#define</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> BS 16<span style="mso-spacerun: yes;"> </span><span style="color: green;">//Block size is selected as the loop-blocking factor. </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX]);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">int</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> main()</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> A[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> B[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> C[MAX][MAX];</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>clock_t<span style="mso-spacerun: yes;"> </span>before, after;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: green;">//Initialize array</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(i=0;i&lt;MAX;i++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>A[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>B[i][j]=j;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>before = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>matmul(C, A, B);</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>after = clock();</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>printf(<span style="color: maroon;">"\nTime taken to complete : %7.2lf secs\n"</span>, (<span style="color: blue;">float</span>)(after - before)/ CLOCKS_PER_SEC); <span style="color: green;">//List time taken to complete add function</span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: green; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;">void</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> matmul(<span style="color: blue;">int</span> c[][MAX], <span style="color: blue;">int</span> a[][MAX], <span style="color: blue;">int</span> b[][MAX])</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">int</span> i, j, k, jj, kk;</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="color: blue;">for</span>(j=0;j&lt;MAX; j += BS) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(k=0; k&lt;MAX; k += BS)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span><span style="color: blue;">for</span>(i=0; i &lt; MAX; i++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>{ </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span><span style="color: blue;">for</span>(kk=k; kk&lt;k+BS; kk++)</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span>{</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt 1.5in; mso-layout-grid-align: none;"><span style="font-size: 10pt; color: blue; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-spacerun: yes;"> </span>for</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">(jj=j; jj&lt;j+BS; jj++) </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>{<span style="mso-spacerun: yes;"> </span></span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 4;"> </span>c[i][jj] = (c[</span><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT">i][jj] + a[i][kk] * b[kk][jj]); </span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-ansi-language: IT; mso-no-proof: yes;" lang="IT"><span style="mso-tab-count: 3;"> </span><span style="mso-spacerun: yes;"> </span></span><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 3;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 2;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"><span style="mso-tab-count: 1;"> </span><span style="mso-spacerun: yes;"> </span>}</span></p>
<p class="MsoNormal" style="margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;">}</span></p>
<p class="MsoNormal" style="margin: 3pt 0in 9pt;"><span style="font-size: 10pt; font-family: 'Courier New'; mso-no-proof: yes;"> </span></p>
<!--CTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dt--> ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking</link>
      <pubDate>Mon, 13 Jul 2009 15:36:15 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-loop-blocking</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>Performance Tools for Software Developers - Auto parallelization and  /Qpar-threshold</title>
      <description><![CDATA[ <!--CTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dt--> 
<table border="0" cellpadding="0" cellspacing="15">
<tbody>
<tr>
<td class="bodycopy">
<p>The auto-parallelization feature of the Intel C++ Compiler automatically translates serial portions of the input program into semantically equivalent multithreaded code. Automatic parallelization determines the loops that are good work sharing candidates, performs the dataflow analysis to verify correct parallel execution, and partitions the data for threaded code generation as is needed in programming with OpenMP directives. The OpenMP and Auto-parallelization applications provide the performance gains from shared memory on multiprocessor systems, IA-32, Intel 64 and Itanium processors.</p>
<p>The following table lists the options that enable Auto-parallelization:</p>
<blockquote><b>/Qparallel:</b><br />Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. <br /><br /><b>/Qpar-threshold:n</b><br />This option sets a threshold for the auto-parallelization of loops based on the probability of profitable execution of the loop in parallel. To use this option, you must also specify -parallel (Linux and Mac OS X) or /Qparallel (Windows). The default is /Qpar-threshold:100.</blockquote>
<p>This option is useful for loops whose computation work volume cannot be determined at compile-time. The threshold is usually relevant when the loop trip count is unknown at compile-time.</p>
<p>The compiler applies a heuristic that tries to balance the overhead of creating multiple threads versus the amount of work available to be shared amongst the threads.</p>
<p>The n is an integer whose value is the threshold for the auto-parallelization of loops. Possible values are 0 through 100. If <i>n</i> is 0, loops get auto-parallelized always, regardless of computation work volume. If <i>n</i> is 100, loops get auto-parallelized when performance gains are predicted based on the compiler analysis data. Loops get auto-parallelized only if profitable parallel execution is almost certain. The intermediate 1 to 99 values represent the percentage probability for profitable speed-up. For example, <i>n</i>=50 directs the compiler to parallelize only if there is a 50% probability of the code speeding up if executed in parallel.</p>
<p>Also, to be "100%" sure that a loop will benefit from parallelization, the compiler needs to know the iteration count at compile time. For a "99%" or lower threshold, knowing the iteration count at compile time is not a requirement.</p>
<p>This leads to a big difference in the number of loops parallelized at 99% compared to 100%. For many apps, 99% is a better setting, but for some apps with a lot of short loops, 99% will slow them down.</p>
<p>The following example, int_sin.c, does not auto parallelize when we use /Qpar-threshold:100 using command line below :</p>
<blockquote>C: &gt;icl -c /Qparallel /Qpar-report3 /Qpar-threshold:100 int_sin.cquote&gt;
<p>If we use /Qpar-threshold:99 then it is parallelized.</p>
<p><b>Example:</b></p>
<p class="whs23" style="MARGIN: auto 0in 0pt"><b style="mso-bidi-font-weight: normal"></b></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// int_sin.c</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Intel C++ compiler sample program</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;stdio.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;stdlib.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;time.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#include</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: maroon">&lt;mathimf.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Function to be integrated</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Define and prototype it here</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// | sin(x) |</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">#define</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">INTEG_FUNC(x) fabs(sin(x))</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: green; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">// Prototype timing function</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">double</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">dclock( <span style="COLOR: blue">void</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">int</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">main( <span style="COLOR: blue">void</span>)</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Loop counters and number of interior points</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">unsigned</span><span style="COLOR: blue">int</span> i, j, N;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Stepsize, independent variable x, and accumulated sum</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> step, x_i, sum;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Timing variables for evaluation </span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> start, finish, duration, clock_t;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Start integral from</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> interval_begin = 0.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-ali gn: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Complete integral at</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">double</span> interval_end = 2.0 * 3.141592653589793238;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Start timing for the entire application</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">start = clock();</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Number of | Computed Integral | "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Interior Points | | "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">for</span> (j=2;j&lt;10;j++)</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">"------------------------------------- "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Compute the number of (internal rectangles + 1)</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">N = 1 &lt;&lt; j;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Compute stepsize for N-1 internal rectangles</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">step = (interval_end - interval_begin) / N;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Approx. 1/2 area in first rectangle: f(x0) * [step/2]</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum = INTEG_FUNC(interval_begin) * step / 2.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Apply midpoint rule:</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Given length = f(x), compute the area of the</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// rectangle of width step</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Sum areas of internal rectangle: f(xi + step) * step</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: blue">for</span> (i=1;i&lt;N;i++)</span></p>
<span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">{</span>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">x_i = i * step;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum += INTEG_FUNC(x_i) * step;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">}</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="COLOR: green">// Approx. 1/2 area in last rectangle: f(xN) * [step/2]</span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">sum += INTEG_FUNC(interval_end) * step / 2.0;</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"> </p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes; mso-ansi-language: IT" lang="IT">printf( <span style="COLOR: maroon">" %10d | %14e | "</span>, N, sum);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">}</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">finish = clock();</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">duration = (finish - start);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt;  FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" Application Clocks = %10e "</span>, duration);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">printf( <span style="COLOR: maroon">" "</span>);</span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"><span style="mso-no-proof: yes"><span style="font-size: small; font-family: Times New Roman;">}</span></span></p>
</blockquote>
</td>
</tr>
</tbody>
</table>
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<td><img src="http://software.intel.com/file/6324" height="5" width="388" /></td>
</tr>
<tr>
<td height="10"></td>
</tr>
</tbody>
</table> ]]></description>
      <link>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold</link>
      <pubDate>Mon, 13 Jul 2009 15:32:16 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/performance-tools-for-software-developers-auto-parallelization-and-qpar-threshold</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>OpenMP* Loops with Function Calls for Bounds May Not Parallelize</title>
      <description><![CDATA[ <br />
<div id="art_pre_template"><strong>Reference Number :</strong>  DPD200110877<br /><br /><br /><strong>Version :</strong> 11.0, 11.1 or Intel® Parallel Composer<br /><br /><br /><strong>Operating System : </strong>Windows*, Linux*, Mac OS X*<br /><br /><br /><strong>Problem Description : </strong>The OpenMP* 3.0 standard now supports using STL iterators for OpenMP loop bounds.  However, the Intel® C++ Compiler does not parallelize code like the following:<br /><br />
<pre name="code" class="cpp">#include &lt;vector&gt;

void iterator_example()
{
  std::vector&lt;double&gt; vec(23);
  std::vector&lt;double&gt;::iterator it;

#pragma omp parallel for default(none) shared(vec) 
  for (it = vec.begin(); it &lt; vec.end(); it++)
  {
    *it = 1.0;// do work with *it //
  }
}</pre>
<br /><br />The compiler will not give an indication (as it should) that the loop was parallelized for OpenMP*.  If you examine the code, you will see that the compiler generates a serial version of the loop.  This is because of an issue with the compiler using function calls on loop bounds that are inlined causing the compiler to not recognize the loop as being a validly formed loop for parallelization.<br /><br /><br /><strong>Resolution Status : </strong>This will be resolved in an upcoming compiler update.<br /><br /><br /><br /><em>[DISCLAIMER: The information on this web site is intended for hardware system manufacturers and software developers. Intel does not warrant the accuracy, completeness or utility of any information on this site. Intel may make changes to the information or the site at any time without notice. Intel makes no commitment to update the information at this site. ALL INFORMATION PROVIDED ON THIS WEBSITE IS PROVIDED "as is" without any express, implied, or statutory warranty of any kind including but not limited to warranties of merchantability, non-infringement of intellectual property, or fitness for any particular purpose. Independent companies manufacture the third-party products that are mentioned on this site. Intel is not responsible for the quality or performance of third-party products and makes no representation or warranty regarding such products. The third-party supplier remains solely responsible for the design, manufacture, sale and functionality of its products. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others.]</em></div> ]]></description>
      <link>http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize</link>
      <pubDate>Thu, 12 Mar 2009 17:06:43 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/openmp-loops-with-function-calls-for-bounds-may-not-parallelize</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>Disable movbe to Test Intel® Atom™ Processor Targeted Code on non-Intel® Atom™ Processor Platforms</title>
      <description><![CDATA[ <p>The Intel® Compilers 11.0 allow you to target the Intel® Atom™ processor using the /QxSSE3_ATOM or -xSSE3_ATOM compiler options.  These options enable the generation of the movbe instruction which is unique to the Intel® Atom™ processor.  However, there is sometimes a need to run such codes on a different processor such as the Intel® Pentium® III processor (for example, for validation purposes where an Intel® Atom™ processor isn't available).  In these situations, the compiler provides the /Qinstruction:nomovbe (for Windows*) and -minstruction=nomovbe (for Linux*/Mac*) options to disable the generation of this instruction.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms</link>
      <pubDate>Fri, 20 Feb 2009 16:41:09 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/disable-movbe-to-test-intel-atom-targeted-code-on-non-atom-platforms</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
    <item>
      <title>Intel® C++ Compiler for Linux* - OpenMP* specification support</title>
      <description><![CDATA[ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table border="0" cellspacing="15" cellpadding="0"><tr><td class="bodycopy">
<table cellspacing="15" cellpadding="0" border="0"><tr><td bgcolor="#A6A6A6"><table cellspacing="1" cellpadding="5" border="0">
<tr>
<td class="bodycopy" bgcolor="#EFEFEF"><strong>Intel&reg; C++ / Fortran Compiler Version</strong></td>
<td class="bodycopy" bgcolor="#EFEFEF"><strong>OpenMP* Standard Version</strong></td>
</tr>
<tr>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">10.0</td>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">2.5</td>
</tr>
<tr>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">9.1</td>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">2.5</td>
</tr>
<tr>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">9.0</td>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">2.0</td>
</tr>
<tr>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">8.x</td>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">3.3</td>
</tr>
<tr>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">7.x</td>
<td class="bodycopy" align="middle" bgcolor="#FFFFFF">2.0</td>
</tr>
</table></td></tr></table>
<p><strong>Note</strong>: There's no new features added in OpenMP* 2.5, but some clarifications comparing to OpenMP 2.0. It combined both C/C++ OpenMP 2.0 and Fortran OpenMP 2.0 into one OpenMP 2.5.</p>
<p>Please visit 
<a href="http://www.openmp.org/">www.openmp.org</a> for more information about OpenMP.</p>
<p>The WORKSHARE directive is not currently supported in Intel C++ Compilers, it is accepted in the Intel Fortran Compilers. For more information, please refer to the User's Guide of the compiler.</p>
<p>The OpenMP specification does not define interoperability of multiple implementations; therefore, the OpenMP implementation supported by other compilers and OpenMP support in Intel compilers might not be interoperable. To avoid possible linking or run-time problems, keep the following guidelines in mind:</p>
<ul>
<li>Avoid using multiple copies of the OpenMP runtime libraries from different compilers.</li>
<li>Compile all the OpenMP sources with one compiler, or compile the parallel region and entire call tree beneath it using the same compiler.</li>
<li>Use dynamic libraries for OpenMP.</li>
</ul>
</td></tr></table>
<table border="0" cellspacing="0" cellpadding="0">
<tr><td><img src="http://software.intel.com/file/6324" width="388" height="5"></td></tr>
<tr><td height="10"></td></tr>
</table>
</body></html>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-openmp-specification-support</link>
      <pubDate>Fri, 19 Sep 2008 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-openmp-specification-support#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-openmp-specification-support</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
    </item>
  </channel></rss>