<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sat, 07 Nov 2009 20:42:04 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/prototype-primitives-guide/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network Comments feed</title>
    <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/feed/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>By Neil Haran</title>
      <description><![CDATA[ Great article... I'm wondering how fast the vector cmp operation is?... Does it compare two vec's in a single instruction pass? ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-21942</link>
      <pubDate>Mon, 30 Mar 2009 07:46:10 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-21942</guid>
    </item>
    <item>
      <title>By Bernie</title>
      <description><![CDATA[ i'm wondering which of the following operations are implemented as full precision and/or partial-precision estimates ..

    div_ps
    recipsqrt_ps
    recip_ps
    sqrt_ps
    log2_ps
    exp2_ps ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22222</link>
      <pubDate>Sun, 05 Apr 2009 21:27:51 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22222</guid>
    </item>
    <item>
      <title>By max</title>
      <description><![CDATA[ what is going to be the approximate clock cycle of these instructions ? ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22377</link>
      <pubDate>Wed, 08 Apr 2009 21:35:18 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22377</guid>
    </item>
    <item>
      <title>By Daniel Mcfarlin</title>
      <description><![CDATA[ I think there's a bug in the implementation of _mm512_mask_shuf128x32: wordPerm needs to be right shifted every inner-loop iteration irrespective of the write-mask. At present, it's only right shifted if the write-mask bit for that element is set.  ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22416</link>
      <pubDate>Thu, 09 Apr 2009 15:01:32 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-22416</guid>
    </item>
    <item>
      <title>By Matthias Kretz</title>
      <description><![CDATA[ When you compile with gcc, I recommend to add the following in the header:

#elif defined(__GNUC__)

typedef struct {
    float v[16];
} __attribute__((__may_alias__)) __m512;

typedef struct {
    double v[8];
} __attribute__((__may_alias__)) __m512d;

typedef struct {
    int v[16];
} __attribute__((__may_alias__)) __m512i;

This allows you to cast between the different __m512 types which is often necessary to use the load/store/gather/scatter intrinsics. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23018</link>
      <pubDate>Wed, 22 Apr 2009 06:35:59 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23018</guid>
    </item>
    <item>
      <title>By Matthias Kretz</title>
      <description><![CDATA[ Actually it seems that the prototype header is making so much use of aliasing, which is not allowed according to ISO, that you have to compile with -fno-strict-aliasing when using gcc. Expect undefined results if you don't. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23086</link>
      <pubDate>Thu, 23 Apr 2009 08:55:18 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23086</guid>
    </item>
    <item>
      <title>By Matthias Kretz</title>
      <description><![CDATA[ The *_pd math utility operations in the prototype header all try to operate on 16 doubles. Gave me some curious crashes. If you want to try doubles with this header better replace all the 16s in the _pd functions with an 8... :-) ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23295</link>
      <pubDate>Mon, 27 Apr 2009 05:42:29 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23295</guid>
    </item>
    <item>
      <title>By balaji_ram</title>
      <description><![CDATA[ I tried using the inl file to do some vector operations and Iam getting the results.

I want to get more insight into the purpose of providing these inline functions.
Please correct me if my understanding is wrong / partially correct.

(1) The vector operations performed by these inline methods will, in future, be carried out by the Larrabee hardware and so we can expect it to work much much faster.

(2) This inl file is provided to us so that we can develop applications today, that will exploit the capabilities of the Larrabee hardware when it available.

Thanks very much.

Balaji Ram ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23669</link>
      <pubDate>Mon, 04 May 2009 22:42:42 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23669</guid>
    </item>
    <item>
      <title>By René Rebe &amp;raquo; Blog Archive  &amp;raquo; Larrabee&amp;#8217;s New Instructions</title>
      <description><![CDATA[ n/a ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23694</link>
      <pubDate>Tue, 05 May 2009 06:23:01 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23694</guid>
    </item>
    <item>
      <title>By Matthias Kretz</title>
      <description><![CDATA[ (1) yes. This document here lists the functions sorted by instruction. So you can see that most functions map to one specific LRB instruction.

(2) yes. Though, of course, you will need a compiler that is able to understand the LRB intrinsics and create LRB instructions. I.e. you also need a "LRB-able" compiler, not only hardware. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23862</link>
      <pubDate>Thu, 07 May 2009 01:21:39 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-23862</guid>
    </item>
    <item>
      <title>By Ivo Tops</title>
      <description><![CDATA[ I am missing 64 bit Integers? Are these not supported? ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24343</link>
      <pubDate>Wed, 13 May 2009 23:57:15 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24343</guid>
    </item>
    <item>
      <title>By Steve Pitzel (Intel)</title>
      <description><![CDATA[ Hey all,
Thank you so much for the comments! Our architects and engineers have worked with your feedback and the .inl file now appearing in this document is completely new.

Please download the new file and have a go at it! I'm compiling our engineers comments and will post those shortly as well.

Thank you again!
- Pitz ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24406</link>
      <pubDate>Fri, 15 May 2009 08:48:57 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24406</guid>
    </item>
    <item>
      <title>By Steve Pitzel (Intel)</title>
      <description><![CDATA[ Responses to your comments from Visual Computing Group hardware engineer, Chris Goodman below:

> I think there's a bug in the implementation of _mm512_mask_shuf128x32: wordPerm needs to be right >shifted every inner-loop iteration irrespective of the write-mask. At present, it's only right shifted if the >write-mask bit for that element is set.

You are very correct, thank you for finding this bug!  We will post a new version of the library with this and other issues fixed soon.

> When you compile with gcc, I recommend to add the following in the header:

#elif defined(__GNUC__)

typedef struct {
float v[16];
} __attribute__((__may_alias__)) __m512;

typedef struct {
double v[8];
} __attribute__((__may_alias__)) __m512d;

typedef struct {
int v[16];
} __attribute__((__may_alias__)) __m512i;

>This allows you to cast between the different __m512 types which is often necessary to use the >load/store/gather/scatter intrinsics.

Thank you.  The library does provide functions _mm512_cast*_*() to cast between the different types, however you have provided a better solution for gcc.  This will be added to the new version of the library which will be posted soon.

> Actually it seems that the prototype header is making so much use of aliasing, which is not allowed >according to ISO, that you have to compile with -fno-strict-aliasing when using gcc. Expect undefined >results if you don't.

Thank you, you are correct.  As suggested by another user, we will mark the __m512 types with __attribute__((__may_alias__)) to make casting easier in gcc.

> The *_pd math utility operations in the prototype header all try to operate on 16 doubles. Gave me >some curious crashes. If you want to try doubles with this header better replace all the 16s in the _pd >functions with an 8... :-)

Thank you for finding this large class of embarrassing cut and paste bugs!  We will post a new version of the library with this and other issues fixed soon.

-Chris
 ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24408</link>
      <pubDate>Fri, 15 May 2009 09:02:54 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24408</guid>
    </item>
    <item>
      <title>By Steve Pitzel (Intel)</title>
      <description><![CDATA[ Responses to your comments from Visual Computing Group graphics hardware engineer, Tom Forsyth:

> I'm wondering how fast the vector cmp operation is?... Does it compare two vec's in a single >instruction pass?

Correct.

> i'm wondering which of the following operations are implemented as full precision and/or partial->precision estimates ..
>
> div_ps
> recipsqrt_ps
> recip_ps
> sqrt_ps
> log2_ps
> exp2_ps

The answer is complex, and it depends on the hardware you’re running on and which compiler options you choose, but the one-line answer is that for high-performance code you will usually get around 20 bits of precision or more. So not full precision, but not too far off. Higher-precision variants will also be available where needed.

> what is going to be the approximate clock cycle of these instructions ?

Most of them are one instruction per clock, though a few are longer. Some more details on performance are available in the GDC 2009 talks.
 ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24409</link>
      <pubDate>Fri, 15 May 2009 09:06:06 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24409</guid>
    </item>
    <item>
      <title>By AGPX</title>
      <description><![CDATA[ Talking about memory... if I built a data structure (a tree for example) on the host memory space, I need to remap the pointers to the Larrabbe memory space? I think that to develop Larrabee application right now, it will be useful to have also a library that emulate memory transfers between host CPU and Larrabee. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24463</link>
      <pubDate>Sun, 17 May 2009 10:01:45 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24463</guid>
    </item>
    <item>
      <title>By gshi</title>
      <description><![CDATA[ The description for VKANDNR and VKNOT looks wrong (They are  the same with VKANDN)
 ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24508</link>
      <pubDate>Mon, 18 May 2009 13:34:27 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-24508</guid>
    </item>
    <item>
      <title>By Thomas Willhalm (Intel)</title>
      <description><![CDATA[ _MM_BROADCAST32_NONE is not defined in the header file but listed on this page. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-25941</link>
      <pubDate>Thu, 11 Jun 2009 11:17:10 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-25941</guid>
    </item>
    <item>
      <title>By Thomas Willhalm (Intel)</title>
      <description><![CDATA[ The description of the "expand" intrinsics without a mask have v1_old as old argument, which differs from the implementation. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-25946</link>
      <pubDate>Thu, 11 Jun 2009 13:22:43 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-25946</guid>
    </item>
    <item>
      <title>By Matthias Kretz</title>
      <description><![CDATA[ Look at http://software.intel.com/en-us/forums/developing-software-for-visual-computing/topic/66359/ for a patch to fix all aliasing issues GCC doesn't like. ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-27352</link>
      <pubDate>Thu, 09 Jul 2009 09:07:57 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-27352</guid>
    </item>
    <item>
      <title>By Søren Sandmann </title>
      <description><![CDATA[ From the file it appears as if converting from srgb8 will apply the non-linear transformation to the alpha channel as well as the RGB channels. Is this also the case in the hardware? The problem is that OpenGL calls for treating the alpha channel in srgb8 formats as linear even though the other channels are encoded non-linearly.

See Issue 3 at http://www.opengl.org/registry/specs/EXT/texture_sRGB.txt .

Another question: Will a description of the interface to the texturing units be made available? ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-28746</link>
      <pubDate>Thu, 30 Jul 2009 03:27:04 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-28746</guid>
    </item>
    <item>
      <title>By gshi</title>
      <description><![CDATA[ What instruction(s) can I use to achieve the following:
_M512 va ={a0,a1,a2,a3, a4,a5,a6,a7, a8,a9,a10,a11, a12,a13,a14,a15} ;
_M512 vb ={b0,b1,b2,b3,b4,b5,b6,b7,b8,b9,b10,b11,b12,b13,b14,b15} ;

and I want

_M512 vc ={a0,b0, a1,b1, a2,b2, a3,b3, a4,b4, a5,b5, a6,b6, a7,b7};

What if I want vc to have some random selection of the float values in va and vb?
The shuffle instruction (_mm512_shuf128x32) does not seem to be able to do that.

 ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-28966</link>
      <pubDate>Mon, 03 Aug 2009 15:45:23 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-28966</guid>
    </item>
    <item>
      <title>By exihea</title>
      <description><![CDATA[ I created a table containing all the larrabee prototype instructions:
http://www.ncsa.illinois.edu/~gshi/LRBni_cheatsheet.pdf
 ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-29729</link>
      <pubDate>Mon, 17 Aug 2009 11:11:17 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-29729</guid>
    </item>
    <item>
      <title>By cd rate</title>
      <description><![CDATA[ wow, very useful  table exihea, thanks! ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-33505</link>
      <pubDate>Thu, 29 Oct 2009 07:26:57 -0700</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-33505</guid>
    </item>
    <item>
      <title>By Mark Bavis</title>
      <description><![CDATA[ I just setup my project to use these instructions instead of LRB intrinsics. Now when I try and build my program link.exe spins at 100% CPU usage for several minutes before spitting out an error: LNK1257 code generation failed. If I switch the prototype primitives header back to the intrinsics header, it builds fine. Can someone maybe shed a little light on this? I have tried turning off full program optimization, which is mentioned in the MSDN help page for that error.

Thanks
Mark ]]></description>
      <link>http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-33832</link>
      <pubDate>Mon, 02 Nov 2009 16:45:01 -0800</pubDate>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/prototype-primitives-guide/#comment-33832</guid>
    </item>
  </channel></rss>