<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 08 Nov 2009 12:39:08 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/feed" rel="self" type="application/rss+xml" />
    <title>Intel Software Network - <![CDATA[ Intel® AVX and CPU Instructions ]]> feed</title>
    <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>help on detecting stalls(identifying structural hazards) in assembly code</title>
      <description><![CDATA[ Hi All,<br /> Our project is to optimize instruction scheduling in gcc by detecting structural hazards. We are trying to come up with a test case for the same, a scenario wherein one of the instructions is stalled due to the resource being used by some other instruction. However, we are unable to do so.<br /><br />1. We wrote a C program - doing - floating point multiplications, divisions and additions. However in both the files - 'progname.s' file and 'progname.c.190r.sched2' file, the instructions were scheduled for execution in sequential order. We couldn't find a way to detect a stall, by looking at the assembly code generated.<br />Question: How do we detect that a stall has occurred if execution is being carried out in a particular sequence?<br />Also we would like to know of a tool, which given a 'progname.s' file, gives details of the execution time of each instruction and the clock cycle in which stall will occur, if execution is carried out in this sequence.<br /><br />2. We saw that integer operations were already being performed during compilation. Hence we were left with only floating point operations to be looked into for structural hazards. <br />Question: Once a stall is detected in case of floating point unit being used currently by some other instruction, which instruction can be scheduled in so as to avoid this stall(since integer operations are performed at compile time and floating point units are being used)?<br /><br />Target machine architecture: 686<br />Working on: Intel Pentium Dual Core processor<br /><br />Thanking you,<br />Dhiraj<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69472/</link>
      <pubDate>Wed, 28 Oct 2009 10:18:15 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69472/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>is there a standard format in which we provide architecture specific information to a software</title>
      <description><![CDATA[ Hi All,<br /> Our project requires us to specify architecture specific information to gcc(not .md files in gcc). We need to specify number of cycles taken per instruction for 686 architecture - as information to gcc.<br /><br />Question: Is there a standard format in which we define the architecture specific information to softwares requiring these? Do we have the architecture specific information for Pentium Dual Core architecture in a format, that can be read by any software requiring it?<br /><br />Target Architecture: 686 processor<br />Working on: Intel Pentium Dual Core processor<br /><br />Thanking You,<br /> Dhiraj.<br /> ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69383/</link>
      <pubDate>Sun, 25 Oct 2009 16:24:53 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69383/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>how to turn off out-of-order execution in Intel processor</title>
      <description><![CDATA[ Hi All,<br /> Our project is to optimize instruction scheduling in gcc, by detecting structural hazards. The algorithm employed requires no out-of-order executions by the processor.<br /><br />Question: Is there a command/mechanism to turn out-of-order execution off in Intel processor?<br />Target Architecture: 686 processor<br />Working on: Intel Pentium Dual Core processor<br /><br />Thanking You,<br />Dhiraj. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69382/</link>
      <pubDate>Sun, 25 Oct 2009 14:32:29 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69382/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>Parallel instructions for detecting MSB in array of bytes</title>
      <description><![CDATA[ First time posting, not sure if correct forum, but...<br /><br />I have a large array of bytes, max up to 1600, but mostly up to 128.<br /><br />Bytes are typically 7-bit of information, and the MSB is used as a sentinel, so the MSB is set in a small portion of the bytes.<br /><br />Currently I'm looping through them in a loop, but is there a better way to use SSEx to process 128 bytes in parallel and get back a SSE vector with bits set for each byte?<br /><br />Any suggestions?<br /><br />Thank you,<br />craptacus ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69127/</link>
      <pubDate>Fri, 16 Oct 2009 01:58:32 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69127/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>Why only CS, IP and EFLAGS are saved while interrupt??</title>
      <description><![CDATA[ I am new to assembly programming. I was reading about 386 Interrupt. I came to know that only CS, IP and EFLAGS are saved as a part of interrupt, they pop back when we have iret. But I am wondering, why they didnt save all the visible registers, segments register etc.,???<br /><br />Please excuse me, if I understood something wrong.<br /><br />Thanks for your effort in helping me... ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68619/</link>
      <pubDate>Fri, 16 Oct 2009 01:34:34 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68619/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>Out of order execution</title>
      <description><![CDATA[ Is there a simulator and/or a general procedure one can follow to predict what instructions will be executed in what order (assuming all data is in the L1 cache)? I'm having a hard time comprehending why a given instruction sequence executes much faster than another. I suspect it's due to the out of order execution and register renaming, but I've found no tangible reason yet. Any help would be appreciated. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69140/</link>
      <pubDate>Thu, 15 Oct 2009 23:53:15 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69140/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>[smp] processor disabled</title>
      <description><![CDATA[ Hi,<br />I am trying to manage the multiprocessor initialization.<br /><br />I found a little program [1] that implements the multiprocessor specification [2]. but in the processor entries([2] page 4-7) I get that the APs are disabled.<br /><br />How can I enable the processors??<br />Is it a SW or HW problem ?<br /><br />[1] http://www.uruk.org/mps/<br />[2] http://www.intel.com/design/pentium/datashts/242016.HTM<br /><br />Thank you.<br />Daniel M. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69125/</link>
      <pubDate>Thu, 15 Oct 2009 11:13:42 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/69125/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>LZCNT on Core i7</title>
      <description><![CDATA[ Let me start with: I know that LZCNT is not supported on the Core i7.<br /><br />However when I run theinstruction on my Core i7 I do not as I expect get an illegal instruction exception.<br />Instead it performs a BSR instruction.<br />This is while working in 64 bit mode and using 64 bit registers.<br /><br />Is this a bug, expected behaior or is running unsupported instructions undefined (and this is therefore OK)?<br /><br />For reference the opcodes are:<br />BSR: 0xBD<br />LZCNT: 0xBD (Same as BSR but has a prefix of 0xF3)<br /><br />Any feedback would be helpful.<br /><br />CJ. ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68998/</link>
      <pubDate>Mon, 12 Oct 2009 04:29:45 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68998/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>How to multiply __m128 by a scaler?</title>
      <description><![CDATA[ I am just starting to work with the SSE intrinsic functions. Is there a better way to mupltiply a vector V by a scalar A than what I am doing below?<br /><br />I would like to do the following  u = u + (v*1.5 - vold*0.5)*delta_t;<br /><br />where u, v, and vold are a vector with x, y, z, coordinates represented in a __m128.<br /><br />Is there a better way to do this than to create the a, b and c values as I do in the below code? I am running on an Intel i7 computer so any options would be appreciated.<br /><br />#include "stdafx.h"<br />#include &lt;xmmintrin.h&gt;<br />#include &lt;mmintrin.h&gt;<br />#include &lt;iostream&gt;<br /><br />using namespace std;<br /><br />class __declspec(align(16)) Element {<br />public:<br />Element( float ux, float uy, float uz, float vx, float vy, float vz, float vxold, float vyold, float vzold) {<br />u.m128_f32[0] = ux;<br />u.m128_f32[1] = uy;<br />u.m128_f32[2] = uz;<br />v.m128_f32[0] = vx;<br />v.m128_f32[1] = vy;<br />v.m128_f32[2] = vz;<br />vold.m128_f32[0] = vxold;<br />vold.m128_f32[1] = vyold;<br />vold.m128_f32[2] = vzold;<br />}<br /><br />void Move() {<br />// u = u + (v*1.5 - vold*0.5)*delta_t;<br />float delta_t = 0.01;<br />__m128 a, b, c;<br />a.m128_f32[0] = 1.5;<br />a.m128_f32[1] = 1.5;<br />a.m128_f32[2] = 1.5;<br /><br />b.m128_f32[0] = 0.5;<br />b.m128_f32[1] = 0.5;<br />b.m128_f32[2] = 0.5;<br /><br />c.m128_f32[0] = delta_t;<br />c.m128_f32[1] = delta_t;<br />c.m128_f32[2] = delta_t;<br /><br />u = _mm_add_ps(u, _mm_mul_ps(_mm_sub_ps(_mm_mul_ps(v,a), _mm_mul_ps(vold,b)), c));<br />}<br /><br />__m128 u, v, vold;<br />};<br /><br />int _tmain(int argc, _TCHAR* argv[])<br />{<br />Element A( 1,1,1, 1,2,3, 4,5,6);<br />A.Move();<br /><br />cout &lt;&lt; A.u.m128_f32[0] &lt;&lt; " " &lt;&lt; A.u.m128_f32[1] &lt;&lt; " " &lt;&lt; A.u.m128_f32[2] &lt;&lt; endl;<br /><br />char c;<br />cin &gt;&gt; c;<br />return 0;<br />} ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68733/</link>
      <pubDate>Wed, 30 Sep 2009 11:30:11 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68733/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
    <item>
      <title>CPU Serial Enable Support on Intel Processor</title>
      <description><![CDATA[ Hi, <br />    As per the intel documents the CPU Intel III onwards does contain the CPU Serial number.<br /><br />I have tried most of the things and on multiple PC's could not able to read the serial <br /><br />with EAX =3 and CPUID instruction on all PCs the CPUID is disabled ie EAX =1 and CPUID <br /><br />and the EDX 18 the bit is always "0".<br /><br />Does any body have any Idea how to enable the cpu serial ? ]]></description>
      <link>http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68720/</link>
      <pubDate>Wed, 30 Sep 2009 04:07:58 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/68720/</guid>
      <category>Parallel Programming</category>
      <category>ISN General</category>
    </item>
  </channel></rss>