This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12:
Intel recently unveiled the new Intel® Xeon Phi™ product – a coprocessor based on the Intel® Many Integrated Core architecture. Intel® Math Kernel Library (Intel® MKL) 11.0 introduces high-performance and comprehensive math functionality support for the Intel® Xeon Phi™ coprocessor. You can download the audio recording of the webinar and the presentation slides from the links below.
Intel® VTune™ Amplifier XE 2013
Intel® VTune™ Amplifier XE is an easy to use performance and thread profiler for C, C++, C#, Fortran, Java and MPI developers. No special recompiles are needed, just start profiling. Hotspots are highlighted on the source. A powerful timeline makes it easy to tune your application and scale performance on multicore processors.
The upcoming OpenMP 4.0 will be discussed at SC12, and there will be a number of additions I'm particularly excited to see coming from OpenMP. They are: "SIMD extensions" and "targeting extensions." One helps make the intention of a developer to have code vectorized efficiently be realized, and the other allows for the first time an industry standard to designate code and data be targeted to an attached device.
Intel® Inspector XE has always provided suppression functionality, but with the introduction of the Inspector XE 2013 product, there are more powerful ways to control how your suppressions are matched to found issues and how your suppressions are stored and maintained.
Intel® Advisor XE along with the other Intel® Parallel Studio XE tools lay out a multi-step process to aid developers in transitioning their serial code to efficient and correct parallel code. This blog will focus on the first step of the process: How to determine where to add parallelism in an application.
Has this ever happened to you: You work tirelessly to add threads to your serial code, all your correctness tests are passing, and your application is zooming along almost twice as fast as the serial version on your 2 core machine. Now your friend sees your results and would love to run your program on his machine which is fully-loaded with four cores that are all equipped with Intel® Hyper-Threading Technology (that’s 8 "logical" processors).