The documentation at http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-254C3F9D-5DDD-4B27-95E2-B6986B4A852B.htm indicates that "Only the lower eight elements are used as indices. The upper eight elements are not used." Since this is a single-precision gather, shouldn't all 16 elements be used as indices? Is this a documentation error, or does this pretefch really only operate on half of the elements? (Perhaps the prefetch unit is limited to 8 addresses?)
What is the purpose of the conv argument to the prefetch instructions? Presumably the data isn't actually being converted yet. Is this just a hint about how many bytes will be read from each address?
The instruction is documented to prefetch a float32 vector. I assume that it's equally effective to prefetch an int32 vector (or, in fact, a number of int32s which will be read using legacy x86 instructions). Can someone please confirm this?