This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi. The Intel Xeon Phi is the first commercial product of Intel to incorporate the Many Integrated Core architecture. In this article I will present various frameworks for unleashing the power of multiple threads on the Xeon Phi. We will also have a look at interesting properties and advantages / disadvantages of each framework.
Intel® VTune™ Amplifier XE 2013
Intel® VTune™ Amplifier XE is an easy to use performance and thread profiler for C, C++, C#, Fortran, Java and MPI developers. No special recompiles are needed, just start profiling. Hotspots are highlighted on the source. A powerful timeline makes it easy to tune your application and scale performance on multicore processors.
经过多年的硬件平台优化，硬件平台功耗逐年降低。CPU有了新的低功耗状态，显示功耗也大幅下降。然而，运行在平台上的软件所产生的功耗问题却越来越明显。在软件方面，多媒体应用程序对功耗的影响更加引人关注。事实上，研究表明优化过的多媒体应用程序播放时间是未优化过的媒体程序的两倍多。这篇白皮书介绍了设计和开发绿色多媒体应用程序需要考虑的要素，以及如何分析和优化多媒体应用软件软件在Intel®平台下的功耗。这篇白皮书旨在面向 ISVs, OEMs, 和其他技术相关人士。
This is a set of labs we taught during past workshops, intended to cover more advanced concepts. These are written so that you should be able to guide yourself. The labs are available are both in C/C++ and Fortran.
Before you attempt to run these labs, make sure your environment is properly set up.
Typical reductions in OpenMP* involve using a associative operator op to do local reductions, and then using a reduction clause to collect those local reductions. For example, the following code computes a dot product by computing local sums on each thread and then summing them.
1. Check prerequisites
- Each host and each Intel® Xeon Phi™ coprocessor should have a unique IP address across a cluster;
- ssh access between host(s) and Intel® Xeon Phi™ coprocessor(s) should be password-less;
- Update the Intel® Manycore Platform Software Stack (Intel® MPSS) to current version;