Performance tuning of an existing application is truly a challenge and it depends on a lot of factors like the nature of algorithm the application works on, if the implementation is scalable to take advantage of thread/data parallelism etc. The most logical approach any developer would follow for tuning the performance of an application is to do a dynamic profiling of the application under different workloads, try to analyze the hotspots in that application, and then fine tune them to work best on a given hardware architecture. These hotspots could either be a function or loop which handles high computation load. Intel provides a dynamic profiling tool named Intel® Vtune Amplifier XE which is used for profiling any given application. Once the hotspots are identified, then the next approach is to analyze the corresponding algorithm and look for potential unexploited thread/data parallelism. Also it is a good programming practice to write the code scalable so that it makes use of all the available cores (thread parallelism) and SIMD (Single Instruction Multiple Data) registers in each core (data parallelism). This paper recommends the step by step approach to enable an application with both task parallelism and data parallelism using Intel® Cilk™ Plus. Also the usage of every explicit vectorization extension is explained in detail with examples which clearly gives a good understanding on how and when to use them. To continue reading please click here.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804