Right up front, I am going to tell you that P-states are irrelevant, meaning they will not impact the performance of your HPC application. Nevertheless, they are important to your application in a more roundabout way. Since most of you belong to a group of untrusting and always questioning skeptics (i.e. engineers and scientists), I am going to go through the unnecessary exercise of justifying my claim.
TITLE: “The Intel Xeon Phi coprocessor: What is it and why should I care?”
PART 2: “Getting even more parallelism”
Hi, my name is Taylor Kidd. You many know me from such notables as, “The Beginning Intel® Xeon Phi™ Coprocessor Workshop,” and, “The Advanced Intel® Xeon Phi™ Coprocessor Workshop,” where I mesmerized audiences with over 10 hours of highly technical information.
PART 0: “Introduction”
This two day webinar series introduces you to the world of multicore and manycore computing with Intel® Xeon processors and Intel® Xeon Phi™ coprocessors. Expert technical teams at Intel discuss development tools, programming models, vectorization, and execution models that will get your development efforts powered up to get the best out of your applications and platforms.
Technical Brief- 32 Core Testing Plan Contest
The purpose of this work is to develop and implement an effective algorithm to find automorphism groups of different algebraic and combinatorial objects in n-dimensional vector space over finite field. We use the First String Method (FSM) to deal with the automorphism groups used in cryptography and coding theory.
This year, a simple matrix multiplication problem was posed to the students and we set up an internal contest, to obtain the fastest serial code. Many versions were submitted, and we finally obtained 20x of improvement over the most naïve implementation. The students learned a lot about compiler optimizations, and above all, the effect of the caches in the performance of the code.
We study the performance of a pointer jumping based exact inference algorithm for a special type of junction trees – a chain of cliques. Many traditional methods for exact inference result in unsatisfactory performance for such a type of junction trees due to limited parallelism. A pointer jumping based algorithm helps explore independent operations that can be run in parallel. Our analysis on the performance of the algorithm shows that a large number of processors with shared memory are particularly suitable for exact inference in a chain of cliques.