The general matrix-matrix multiplication (GEMM) is a fundamental operation in most scientific, engineering, and data applications. There is an everlasting desire to make this operation run faster. Optimized numerical libraries like Intel® Math Kernel Library (Intel® MKL) typically offer parallel high-performing GEMM implementations to leverage the concurrent threads supported by modern multi-core architectures. This strategy works well when multiplying large matrices because all cores are used efficiently.
The Intel® Parallel Studio XE 2015 Update 3 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distribute parallel applications for clusters with Intel processors. This package is for cluster users who develop on and build for IA-32 and Intel® 64 architectures on Linux* and Windows*, as well as customers running over the Intel® Xeon Phi™ coprocessor on Linux*. It contains:
Intel® MKL 11.3 Beta (released in April 2015) contains significant performance and scalability improvements for the direct sparse solver (a.k.a. Intel MKL PARDISO), on SMP systems. These improvements particularly benefit the Intel Xeon Phi coprocessors and Intel Xeon processors with large core counts. As an example, the chart below shows a 1.7x to 2.5x speedup of Intel MKL 11.3 Beta over Intel MKL 11.2, when using the PARDISO to solve various sparse matrices on an Intel Xeon Phi coprocessor with 61 cores.
Dynamic Program Slicing is a dynamic program analysis technique that given a slicing criterion (line number, variable,..) finds all statements in the program affecting (if backward sliced) or affected by (if forward sliced) the slicing criterion for a specific execution.
A control-flow graph (CFG) is a fundamental structure used in computer science and engineering for describing and analyzing the structure of an algorithm or program. A dynamic control-flow graph (DCFG) is a specialized CFG that adds data from a specific execution of a program. We provide a tool for generating a DCFG based on the Pin binary-instrumentation package. We also provide an application-programmer interface (API) to access the DCFG data from within another Pin tool or a standalone program. More details follow.
< Overview >
In this article, we are enabling and using Intel(R) Integrated Performance Primitives(IPP), Intel(R) Threading Building Blocks(TBB) and Intel(R) C++ Compiler(ICC) on Linux ( Ubuntu 14.04 LTS 64bit ). We will build and run one of the examples that comes with IPP and apply TBB and ICC on the example to observe the performance improvement of using Intel(R) System Studio features.