<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 08 Nov 2009 00:59:22 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/feed" rel="self" type="application/rss+xml" />
    <title>Intel Software Network - <![CDATA[ Threading on Intel® Parallel Architectures ]]> feed</title>
    <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>To learn parallel programing with FFTs using intel MKL</title>
      <description><![CDATA[ <p style="TEXT-ALIGN: left">hi everyone</p>
<p style="TEXT-ALIGN: left">I´m working with FFTs routines using Intel MKL and I need to develop an application in parallel. Who could tell me where I can to find  documentation about to learn parallel programing with FFTs?<br /><br /><br />thanks<br /><br />Angel</p> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69670/</link>
      <pubDate>Wed, 04 Nov 2009 16:24:04 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69670/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>How to separately control MSRs on Intel Core 2</title>
      <description><![CDATA[ Hi all,<br /><br />In the Software Developer's Manual, it is said that, for Intel Core 2 processor family, MSRs are categorized into Unique and Shared, and Unique means each processor core has a separate MSR. <br /><br />So, I would like to ask if I could separately control the Unique MSRs by each core? Such as enabling L1 cache prefetcher on one core and disabling the L1 cache prefetcher on the other core? And how should I do?<br /><br />Thanks in advence!<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69599/</link>
      <pubDate>Mon, 02 Nov 2009 23:24:12 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69599/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>software barrier performance in fine-grain situation</title>
      <description><![CDATA[ i'm implementing a soft barrier in Intel xeon cpu. it performs not good in fine-grain situation. The problem is that between each barrier, there is not much work-load so that the barrier itself spend much time in case which decrease the performance. I spin in a volatile value for waiting if the other thread is there or not. Is there any possible to make the volatile vlaue in L2 cache so that each cpu can access it. With my knowledge, the L2 cache is sharing for the core in the same socket. that may give better performance. Any comments? ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69510/</link>
      <pubDate>Thu, 29 Oct 2009 11:26:58 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69510/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Strange behavior for matrix multiplication</title>
      <description><![CDATA[ Hi there,<br /><br />I wrote this simple algorithm to multiply then used openMP to run it in parallel. When I tested it on my core2quad Q8200 with Windows Server 2008 and 4gb ram, the behavior for two 1024 x 1024 matrices is very strange. <br />The time for 1023 x 1023 and 1025 x 1025 is much smaller both sequentially and in parallel, but the time for 1024 x 1024 is huge. Also, the scaling in this case is horrible on 4 cores, barely 33% increased performance. I suspect it has something to do with the cache, but I don't know enough about this subject. <br /><br />Here is the code:<br />
<pre name="code" class="cpp">void OpenMPMatrixMultiply()
{
	int i, j, k;

#pragma omp parallel for private(j, k)
    for (i = 0; i &lt; size1; i++)
    {
        for (j = 0; j &lt; size3; j++)
        {
            int partial = 0;
            for (k = 0; k &lt; size2; k++)
            {
                partial += matrix1[i][k] * matrix2[k][j];
            }
            result1[i][j] += partial;
        }
    }
}</pre>
<br />Any help would be appreciated. ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69503/</link>
      <pubDate>Thu, 29 Oct 2009 04:01:19 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69503/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>How does Intel MPI handle network failures</title>
      <description><![CDATA[ Hi all, <br /><br />I am new to the forum and have a question regarding network failures and MPI applications (specifically using the Intel MPI binding).<br /><br />What happens if I have a a number of processes running on a cluster, and someone unplugs a network cable? As far as I have read, the MPI processes gets terminated immediately. How can I circumvent this, say by using some sort of a WAIT or TIMEOUT command if a network fault is detected, so that they can see if maybe they can again recover after a number of (set) seconds?<br /><br />Any help would be very much appreciated! ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69408/</link>
      <pubDate>Mon, 26 Oct 2009 08:27:51 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69408/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Anyone have OpenMP working on MS Visual Studio 2008 SP1 x64?</title>
      <description><![CDATA[ I'm trying to run some basic openmp examples with MS Visual Studio 2008 SP1 x64.  My next step is to install the intel c++ compiler but I'd rather get it working with VS2008 only because then I'm sure I will have problems when I write Matlab MEX files with openMP x64.<br /><br />I start a new Win32 console project with VS2008 SP1. I then add the x64 configuration in the configuration manager. I then add under Project Properties/Configuration Properties/C/C++/Langauage and change OpenMP Support to "Yes /openmp"<br /><br />The application compiles fine but when I run it I get the error: <br />"Unable to start program ....<br />This application has failed to start because the application configuration is incorrect. Review the manifest file for possible errors."<br /><br />What am I doing wrong? Does microsoft visual c++ 2008 doesn't support x64 openmp and it's win32 only?<br /><br />#include "stdafx.h"<br />#include &lt;omp.h&gt;<br />#include &lt;stdio.h&gt;<br /><br />int main (int argc, char *argv[]) {<br />int th_id, nthreads;<br />#pragma omp parallel private(th_id)<br />{<br />th_id = omp_get_thread_num();<br />printf("Hello World from thread %d\n", th_id);<br />#pragma omp barrier<br />if ( th_id == 0 ) {<br />nthreads = omp_get_num_threads();<br />printf("There are %d threads\n",nthreads);<br />}<br />}<br />return 0;<br />} ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69356/</link>
      <pubDate>Fri, 23 Oct 2009 15:43:29 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69356/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>trouble getting parallel do to work</title>
      <description><![CDATA[ I've been working on using multiple cores to speed up some calculations and have failed to work out how to get my code to use OMP to perform the calculations in parallel. The cut-down code block looks like this:<br /><br /><br />c loop over the x cells<br /><br />do ii=1, mx+1<br /><br />c set up the x coordinates<br /><br />....<br /><br />c loop over the y cells<br /><br />do jj=1, my+1<br /><br />c set up the y coordinates<br /><br />...<br /><br />c loop over the z cells<br /><br />!OMP PARALLEL <br /><br />!OMP SINGLE<br />if(OMP_IN_PARALLEL()) then<br />write(*,'(a,3i8)')'# of threads: ',OMP_GET_MAX_THREADS(),OMP_GET_NUM_THREADS(),OMP_GET_THREAD_NUM()<br />endif<br />!OMP END SINGLE<br /><br />!OMP DO SHARED(gmod) PRIVATE(nface,i,ztop,zbot,FieldValue) schedule(dynamic,10) COLLAPSE(2) REDUCTION(+-:gmod)<br /><br />do kk=k0, mz+1<br /><br />c set up z coordinates<br /><br />polyCoord(3,1)=ztop<br />...<br />polyCoord(3,8)=zbot<br /><br />c loop over the faces<br /><br />do nface=1,3<br /><br />c initialise the calculated values<br /><br />do i=1,6<br />FieldValue(1,i)=0<br />end do<br /><br />c check which face <br /><br />if(nface.eq.1) then<br /><br />c top face <br /><br />call calculate_face_response_v2(sus,dens , geomag , cxI , cyI , czI , magx , magy , magz , <br />3 , polyCoord, (nface-1)*2+1, indFace,0, 1 , locCoord, FieldValue, ierr )<br /><br />c add it to the relevant cells<br /><br />if(kk.le.mz) then<br />gmod(ii,jj,kk) = gmod(ii,jj,kk)-FieldValue(1,inds)<br />endif<br />if(kk.gt.1) then<br />gmod(ii,jj,kk-1) = gmod(ii,jj,kk-1)+FieldValue(1,inds)<br />endif<br /><br />elseif(nface.eq.2) then<br /><br />c front face<br /><br />call calculate_face_response_v2(sus,dens , geomag , cxI , cyI , czI , magx , magy , magz , <br />4 , polyCoord, 2, indFaceRect,0, 1 , locCoord, FieldValue, ierr )<br /><br />c add it to the relevant cells<br /><br />...<br /><br />elseif(nface.eq.3) then<br /><br />c left face<br /><br />call calculate_face_response_v2(sus,dens , geomag , cxI , cyI , czI , magx , magy , magz , <br />4 , polyCoord, 3, indFaceRect,0, 1 , locCoord, FieldValue, ierr )<br /><br />c add it to the relevant cells<br /><br />...<br /><br />endif<br /><br />end do<br /><br />enddo<br />!OMP END DO<br />!OMP END PARALLEL<br /><br />enddo<br /><br />enddo<br /><br />The code is calculating the response of a block model and storing the results in the real*4 gmod array dimensioned  (mx,my,mz). The real*4 polyCoord array is dimensioned (3,8) and holds the xyz coords of the 8 corners of a rectangular prism. The real*4 FieldValue array is dimension (1,6) and holds the values calculated for the face. The other parameters to the call to calculate_face_response_v2 are not changed within the loop. Some of the parameters are held in common blocks and others are padded in as parameters. The response calculation routine does not use any common blocks.<br /><br />The problem is that the code compiles fine using /Qopenmp and the compiled program reports that there are 8 threads available (hyperthreaded quad core processor), but the do loop uses only a single thread to run. I've checked this with process monitor, Process Explorer and the Intel concurrency checker and it definitely only uses one thread.<br /><br />I added the Intel sample OMP code just before my loops and the intel code operates as expected and uses 8 threads but mine does not. I've check the asm code generated and the Intel code has calls to routines with names like ___kmpc_global_thread_num just after the !OMP PARALLEL line, but there are no such calls associated with my !OMP PARALLEL line.<br /><br />I've tried dropping various parts of my code to see if I can isolate the problem, but nothing seems to change the code to allow the use of the multiple cores.<br /><br />Does anyone have any suggestions as to what is causing the compiler to ignore my OMP directives?<br /><br />Thanks in advance<br />John<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69336/</link>
      <pubDate>Thu, 22 Oct 2009 18:08:55 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69336/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Memory distribution</title>
      <description><![CDATA[ Hello everyone.<br />I'm want to run two mpi (MPICH2) codes in my cluster. <br />I send the first work distributed with round robin and all is ok. <br />The problem appear when I send the second work. The memory used by the second one is from the cpu used by the first work and doesn't use the memory of the other, almost free, cpu.<br />There is a flag with which I can tell to MPICH2 that it should use the memory free cpus?<br />Thank you.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69327/</link>
      <pubDate>Thu, 22 Oct 2009 12:35:38 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69327/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>OPENMP Error in Fortran 11.1 build 048</title>
      <description><![CDATA[ Now, i'm testing the intel fortran compiler ( 11.1.048 )<br /><br />With openmp, I have a correct results in the previous version(10.1.013), Now, there's some problem<br />OS : Windows XP SP3<br />VS : 2005<br />-------------------------------------------------------<br />Integer omp_get_max_threads, nprocs<br />nprocs = omp_get_max_threads()<br /><br /><br />--------------------------------------------------------<br />In 10.1.013,  nprocs = 8, but, in 11.1.048, I have nprocs = 32768<br />What should i do????<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69308/</link>
      <pubDate>Thu, 22 Oct 2009 01:23:28 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69308/</guid>
      <category>Parallel Programming</category>
    </item>
    <item>
      <title>Need help debugging in OpenMP</title>
      <description><![CDATA[ I am a 20-year experienced Fortran programmer used to using MPI but I am new to OpenMP.   I am having quite a bit of trouble debugging some code that I am trying to run (a coupled weather/land surface model) where I end up getting some NaNs in one particular cell.  I am getting completely unexpected behaviors for my outputs.  I have located a place where I can WRITE a particular variable and get a NaN (about 15 minutes into the model run).  However if I back up a couple of lines and try to WRITE the variable (or anything else), the model runs for an hour and times out.  (This same behavior happens at various other places in the code where I try to print stuff out.)  There are some OpenMP directives near there like:<br /><br />!$OMP PARALLEL DO   &amp;<br />!$OMP PRIVATE (ij)<br /><br />Should that have anything to do with whether I can print out variables, or am I chasing a wild goose?  There are also BENCH_START() and BENCH_END() directives which I guess are for some timing diagnostic program.<br /><br /><br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69298/</link>
      <pubDate>Wed, 21 Oct 2009 14:00:39 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/69298/</guid>
      <category>Parallel Programming</category>
    </item>
  </channel></rss>