<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Wed, 16 May 2012 13:04:21 -0700 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network Comments Feed</title>
    <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>By 
    Twitter Trackbacks for
     
    3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX) - Intel® Software Network 
    [intel.com]
    on Topsy.com
  </title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-49622</link>
      <pubDate>Thu, 30 Sep 2010 18:51:59 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-49622</guid>
    </item>
    <item>
      <title>By Stan Melax (Intel)</title>
      <description><![CDATA[ Intel's compiler (beta for version 12) was used to compile the C++ code with intrinsics into assembly.  

To get the good scaling on the transpose routines by themselves requires the compiler to use a memory operand when emitting assembly for the vinsertf128  rather than separate load and vinsertf128 from a xmm register.  (a known peformance tip for assembly programming)  This allows the usage of port 0 (instead of only 5) for the corresponding uops to fill the YMM.   This is important because the subsequent shuffles will require the use of port 5 - the throughput bottleneck for the loop that tests just the transpose to SOA and back by itself.   The Intel compiler is aware of this performance tip and generates the assembly code we wanted.

A note about RDTSC - it counts base clock cycles and results may vary when processor is using Turbo mode.    Turbo was off during testing for this article.

> ...Newton-Rhapson?
The results from _mm_rsqrt_ps() are to a limited amount of accuracy and many applications will refine these results using Newton-Rhapson technique.  Yes, using Newton-Rhapson would have shown better scaling over serial.  However, an objective was to find a small computation where the aos to soa on-the-fly technique would break-even or be better than serial.  Clearly, workloads that do more work than this will further amortize the transpose cost and provide better speedup.

 
 ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-49725</link>
      <pubDate>Mon, 04 Oct 2010 12:10:46 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-49725</guid>
    </item>
    <item>
      <title>By gxingram</title>
      <description><![CDATA[ Bright Solution, Regards, George Ingram ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-54524</link>
      <pubDate>Thu, 06 Jan 2011 06:58:46 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-54524</guid>
    </item>
    <item>
      <title>By Justin</title>
      <description><![CDATA[ Good stuff. I'm interested in single and double precision Mat4 by Mat4 operations in AVX - do you guys plan on putting anything up?
  ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-55244</link>
      <pubDate>Tue, 18 Jan 2011 17:11:13 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-55244</guid>
    </item>
    <item>
      <title>By A Look at Sandy Bridge: Integrating Graphics into the CPU &amp;#8211; Intel Software Network Blogs</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-56146</link>
      <pubDate>Thu, 10 Feb 2011 15:51:22 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-56146</guid>
    </item>
    <item>
      <title>By Jeff&amp;#8217; Notebook: 3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX) &amp;#8211; Intel Software Network Blogs</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-56186</link>
      <pubDate>Fri, 11 Feb 2011 08:42:20 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-56186</guid>
    </item>
    <item>
      <title>By Stan Melax (Intel)</title>
      <description><![CDATA[ If this article flew over your head, you might want to first read an intro article (prequal) with some background information on optimizing this very loop but starting from a naive implementation:
http://software.intel.com/en-us/articles/free-speedup-with-compiler-switches-for-fast-math-and-intel-streaming-simd-extensions/

Also, an educational video that really explains how the x86 works (including out-of-order and simd), and how to go about performance tuning your code, can be found at the link:  http://www.gdcvault.com/play/1014645

enjoy ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-58052</link>
      <pubDate>Wed, 30 Mar 2011 23:38:00 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-58052</guid>
    </item>
    <item>
      <title>By Jeff&amp;#8217;s Notebook: 3D Vector Normalization Using 256-Bit Intel® Advanced Vector Extensions (Intel® AVX) &amp;#8211; Intel Software Network Blogs</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-58635</link>
      <pubDate>Wed, 13 Apr 2011 10:59:46 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-58635</guid>
    </item>
    <item>
      <title>By vanswaaij</title>
      <description><![CDATA[ Hi Stan, 
Could you extend your example on how to best handle having some zero length vectors in the list of vectors while keeping everything as SIMD as possible?
Thanks. ]]></description>
      <link>http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-71579</link>
      <pubDate>Thu, 15 Mar 2012 07:45:30 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/3d-vector-normalization-using-256-bit-intel-advanced-vector-extensions-intel-avx/#comment-71579</guid>
    </item>
  </channel></rss>
