<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Fri, 25 May 2012 21:57:45 -0700 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network Comments Feed</title>
    <link>http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>By Genome project at the emperor: Of the Kalifornier &amp;quot; DNA, enormous genomes a Project | Carcinoma Blog</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-46408</link>
      <pubDate>Wed, 21 Jul 2010 01:09:43 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-46408</guid>
    </item>
    <item>
      <title>By tdistler</title>
      <description><![CDATA[ Nice article. However, it would seem that this test indicates that implementing the H.264 IDCT using AVX may actually be slower than using SSE... given that H.264 uses an integer DCT. Would you agree with this assessment? ]]></description>
      <link>http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-47848</link>
      <pubDate>Tue, 24 Aug 2010 12:41:29 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-47848</guid>
    </item>
    <item>
      <title>By Richard Hubbard (Intel)</title>
      <description><![CDATA[    Thanks for your great question. The results did show that the single precision floating point implementation was slightly slower than the short integer version. So I ran another test, one that should have been run before publishing the paper. 
   The new test was run with the exact same short integer implementation compiled with the /QxAVX 
switch. This produces 128-bit AVX integer instructions. Then I compared those results to the original short integer binary that was compiled with /QxSSE4.1. The results show a 1.07x speedup. None of the source code was changed, it was simply recompiled for Intel AVX. 
   The assembly language produced when compiling with /QxSSE4.1 has 22 register-to-register moves. The code produced with the /QxAVX switch did not have any register-to-register moves. The Intel AVX non-destructive source instructions reduces the need for register copies in this application. There are benefits to using Intel AVX for integer-based algorithms today. 
   The AVX single precision floating point IDCT implementation provides more accuracy compared to 
the short integer in most cases, and has 1.78x speedup compared to the SSE floating point. 
   I'll update the text of the paper with these new results. ]]></description>
      <link>http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-48224</link>
      <pubDate>Thu, 02 Sep 2010 14:35:54 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/using-intel-advanced-vector-extensions-to-implement-an-inverse-discrete-cosine-transform/#comment-48224</guid>
    </item>
  </channel></rss>
