Cilk_for和#pragma SIMD的区别和使用说明(2)

先说说向量化的问题,使向量化计算(vectorizer)是英特尔编译器的一个功能组件,它用到了在MMX™中的SIMD 指令和Intel® SSE, SSE2, SSE3, SSE4以及SSSE3指令集。该功能组件能在编译过程中诊断相应的操作是否能进行并行计算,然后将这些操作根据数据类型翻译成相应的SIMD指令使其能同时执行多达16个单元的处理计算。

从编译器角度,可以有两种方式做到这一点:一种是设置编译器编译开关;另一种是在程序中添加相应的编译指示。

下面是有关设置编译器编译开关的汇总(这些编译选项同时支持 IA-32和 Intel® 64架构)。
Linux* OS Mac OS*             X Windows* OS 说明
-x                                        /Qx                     将生成专有处理器指令集的代码
-ax                                      /Qax                    在单一的执行代码中既生成一组专有的指令集代码,同时还生成一组通用的执行代码。通常通用代码的性能比较差。
-vec                                   /Qvec                    设置或取消向量化开关。缺省状态是设置向量化开关。目前支持 IA-32和Intel® 64架构
-vec-report                         /Qvec-report           控制诊断使向量化计算信息的报告
-simd                                /Qsimd                    对用户在程序中设置的向量化指示进行向量化编译

英特尔编译器地向量化程度很大程度上依赖于编译器辨别和消除内存索引过程中的奇异性能力。适当的编译选项有时能使向量化达到最佳状态。
另外,英特尔编译器还提供了一种能自动对循环语句识别并进行向量化的编译选项组合:-parallel (Linux* 和Mac OS* X) 或者/Qparallel (Windows*) 和 -x (Linux) 或者 /Qx (Windows) 选项。在大多数情况下,编译器会将最外层的循环进行并行处理而将最里面的循环进行向量化。如果考虑性能等因素,编译器会对相同循环做并行和向量化处理。

下面列举了SIMD pragma的五个可选项子句用来指导编译器如何更有效地对代码进行向量化。
vectorlength(num1, num2, …, numN)
directs the vectorizer to choose from one of the specified vector length (VL).
private(expr1, expr2, …, exprN)
directs the vectorizer to make these L-value expressions private to each iteration of the loop. Multiple private clauses are merged as a union. Unless the compiler can prove initial/last values are unused, initial value is broadcast to all private instances, and the last value is copied out from the last iteration instance.
linear(var1:step1, var2:step2, …, varN:stepN)
directs the compiler that for every iteration of scalar loop, var1 is incremented by step1, var2 is incremented by step2, and so on. Every iteration of the vector loop therefore increments these vars by VL*step1, VL*step2, …, VL*stepN, respectively. Multiple linear clauses are merged as a union. If a var is given two or more steps, it is an error.
reduction(oper:var1,var2,…,varN)
directs the compiler to perform vector reduction of oper kind to var1, var2, …, varN. A SIMD pragma/directive may have multiple reduction clauses with the same or different operators. If a var is associated to two or more different reduction opers, it is an error.
[no]assert
directs the compiler to assert or not to assert when the vectorization fails. Default is assert. A SIMD pragma/directive may not have multiple instances of this clause. It is an error.

其他限制
• The countable loop for the SIMD pragma/directive must conform to the for-loop (C/C++) or DO-loop (FORTRAN) style usable for OpenMP* worksharing loop construct. See http://www.openmp.org/mp-documents/spec30.pdf (Section 2.5.1). Additionally, the loop control variable must be of the signed or unsigned integer type, or a pointer type.
• The loop undergoing SIMD vectorization must be free from C++ exception handling code.
• Fast FP model is used for the loop undergoing SIMD vectorization if the vectorizer has to otherwise give up vectorizing the loop.
• The vector values must be of only of following data types: signed/unsigned 8/16/32/64-bit integers, single/double-precision floating point or single/double-precision complex.
• A loop undergoing SIMD vectorization may not contain another loop in it (i.e., user enforced vectorization is possible on innermost loops only).
NOTE: C++ inlining can create such an inner loop and result in an error, which may not be obvious at the source level.


有关编译器优化的更完整信息,请参阅优化通知