面向英特尔® MIC 架构进行应用的适用性分析
In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
Intel is bringing to market, in anticipation of general availability of the Intel® Xeon Phi™ Processor (codenamed Knights Landing), the Developer Access Program (DAP). DAP is an early access program for developers worldwide to purchase an Intel Xeon Phi Processor based system.
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
避免线程之间发生堆冲突 (PDF 256KB)
Vectorization Advisor is like having a trusted friend look over your code and give you advice based on what he sees. As you’ll see in this article, user feedback on the tool has included, “there are significant speedups produced by following advisor output, I'm already sold on this tool!”
检测线程应用中的内存带宽饱和度 (PDF 231KB)