The specific optimization and general support for the latest Intel® AVX2 instructions have been added in the Intel MKL v11.0. This article lists the specific functions that are optimized for Intel AVX2.
多线程开发
Intel® MKL Sparse BLAS Overview
Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of equations or eigenvalue problems
A Gentle Introduction to Parallel Software webinar slides
Webinar slides - Dr. Tim Mattson, Principal Engineer at Intel's Microprocessor Technology Labs, will lead a webinar focused on actual code and the parallel programming APIs available to software developers. Tim will begin with an overview of the high level issues that apply to the task of creating a parallel program and then move on to consider the most commonly used parallel algorithms. He will then discuss the major parallel programming APIs (OpenMP*, MPI, and Windows* threads) showing how they are used with different algorithms and different platforms. After attending this webinar, developers should be conversant with major concurrent APIs and algorithms and be well positioned to start incorporating these techniques in their applications.
Simplifying Parallelism Implementation with Intel Threading Building Blocks webinar slides
Webinar slides - Use the Intel® Threading Building Blocks (Intel® TBB) template library to introduce parallelism into applications. The use of Lambda expressions available in Intel® Parallel Composer are discussed, along with data parallel and task parallel models of parallel programming. Specific focus is placed on representing common parallel programming patterns, such as pipelines and concurrent queues, using Intel TBB templates. The newest enhancements to the Intel TBB library are also explored, including task-to-thread affinity and task cancellation support.
The Key to Scaling Applications for Multicore webinar slides
Webinar slides - Whether an application is serial, partially parallel, or fully parallel it can get significant benefit from parallelism. New Intel® Parallel Studio tools provide Windows* developers with the keys to get the most out of parallelism. Gain an in-depth understanding of when, where, and how much to use parallelism to achieve optimal results. Microsoft* Visual Studio C/C++ developers will learn how to identify and safely design applications that can scale with increasing processor core counts. Recommended companion technical webinar: Identify and Address Threading Opportunities.
Superscalar programming 101 (Matrix Multiply) Part 5 of 5
In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four cores (8 threads). and each processor with four L2 and four L1 caches each shared by one core and 2 threads, we find:

Superscalar programming 101 (Matrix Multiply) Part 4 of 5
In the last installment (Part 3) we saw the effects of the QuickThread Parallel Tag Team method of Matrix Multiplication performed on two single processor systems:

