Оптимизация? Конечно, каждый сталкивался с данной задачей при разработке своих, сколь-нибудь значительных, требующих определённых вычислений, приложений. При этом способов оптимизировать код существует огромное множество, и, как следствие, различных путей сделать это в автоматическом режиме с помощью опций компилятора. Вот здесь и возникает проблема – как выбрать то, что нужно нам и не запутаться?
矢量化
Processing Arrays of Bits with Intel® Advanced Vector Extensions 2 (Intel® AVX2)
It is only a few weeks until you will get a chance to get your hands on the 4th Generation Intel® Core&tm; Processor Family formerly code-named Haswell. This architecture will come with some very nice features including Intel® Advanced Vector Extensions 2 (Intel® AVX2). Most notably, Intel®
Intel® Xeon Phi™ coprocessor Power Management Part 1: P-States, Reducing power consumption without impacting performance
Right up front, I am going to tell you that P-states are irrelevant, meaning they will not impact the performance of your HPC application. Nevertheless, they are important to your application in a more roundabout way. Since most of you belong to a group of untrusting and always questioning skeptics (i.e. engineers and scientists), I am going to go through the unnecessary exercise of justifying my claim.
Measuring performance in HPC
This is the first article in a series of articles about High Performance Computing with the Intel Xeon Phi. The Intel Xeon Phi is the first commercial product of Intel to incorporate the Many Integrated Core architecture. In this article I will present the basics of the Xeon Phi architecture, the programming models and what we can do to measure the performance in cycles for micro benchmarks.
Intel Xeon Phi Coprocessor April 2013 Developer Webinar Q&A Responses
Answers for the questions raised during the April session of our Introduction to High Performance Application Development for Intel® Xeon® & Intel® Xeon Phi™ processors class have been assembled. There were some duplicates and other questions we couldn't decipher, either because of the wording or because of implied context that was not spelled out. We tried to address the rest, which appear below:
内存分配和首次访问
内存分配和首次访问
相对至强而言,协处理器的内存分配成本较高——因此尽可能重复使用已非配的内存是非常明智的。例如,如果某函数被重复调用(假设在循环内),并且该函数使用数组作为临时存储,尝试初次分配足够大的数组(所需的最大大小),并在后续调用中重复使用该数组:
static real *temp_array=0;
