Multithread development

Introducing Batch GEMM Operations

The general matrix-matrix multiplication (GEMM) is a fundamental operation in most scientific, engineering, and data applications. There is an everlasting desire to make this operation run faster. Optimized numerical libraries like Intel® Math Kernel Library (Intel® MKL) typically offer parallel high-performing GEMM implementations to leverage the concurrent threads supported by modern multi-core architectures. This strategy works well when multiplying large matrices because all cores are used efficiently.

  • Developers
  • Partners
  • Professors
  • Students
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Unix*
  • Windows*
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • Intel Math Kernal Library (Intel MKL)
  • Development Tools
  • Optimization
  • Parallel Computing
  • Intel® Parallel Studio XE 2015 Update 3 Cluster Edition Readme

    The Intel® Parallel Studio XE 2015 Update 3 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distribute parallel applications for clusters with Intel processors.  This package is for cluster users who develop on and build for IA-32 and Intel® 64 architectures on Linux* and Windows*, as well as customers running over the Intel® Xeon Phi™ coprocessor on Linux*. It contains:

  • Developers
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Server
  • C/C++
  • Fortran
  • Intel® Parallel Studio XE Cluster Edition
  • Message Passing Interface
  • Cluster Computing
  • Significant Scalability and Performance Improvement for Intel® MKL PARDISO on SMP Systems

    Intel® MKL 11.3 Beta (released in April 2015) contains significant performance and scalability improvements for the direct sparse solver (a.k.a. Intel MKL PARDISO), on SMP systems. These improvements particularly benefit the Intel Xeon Phi coprocessors and Intel Xeon processors with large core counts. As an example, the chart below shows a 1.7x to 2.5x speedup of Intel MKL 11.3 Beta over Intel MKL 11.2, when using the PARDISO to solve various sparse matrices on an Intel Xeon Phi coprocessor with 61 cores.

  • Developers
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel Math Kernal Library (Intel MKL)
  • Development Tools
  • Intel® Many Integrated Core Architecture
  • Dynamic Program Slicing with PinPlay

    Dynamic Program Slicing is a dynamic program analysis technique that given a slicing criterion (line number, variable,..) finds all statements in the program affecting (if backward sliced) or affected by (if forward sliced) the slicing criterion for a specific execution.

  • Developers
  • Partners
  • Professors
  • Students
  • Linux*
  • Server
  • C/C++
  • Advanced
  • Intermediate
  • Academic
  • Debugging
  • Development Tools
  • Threading
  • Dynamic Control-flow Graph Generation with PinPlay

    A control-flow graph (CFG) is a fundamental structure used in computer science and engineering for describing and analyzing the structure of an algorithm or program. A dynamic control-flow graph (DCFG) is a specialized CFG that adds data from a specific execution of a program. We provide a tool for generating a DCFG based on the Pin binary-instrumentation package. We also provide an application-programmer interface (API) to access the DCFG data from within another Pin tool or a standalone program. More details follow.

  • Developers
  • Partners
  • Professors
  • Students
  • Linux*
  • Server
  • C/C++
  • Advanced
  • Academic
  • Development Tools
  • Parallel Computing
  • Threading
  • Optimizing Image resizing example of Intel(R) Integrated Performance Primitives (IPP) with Intel(R) Threading Building Blocks and Intel(R) C++ Compiler

    < Overview >

     In this article, we are enabling and using Intel(R) Integrated Performance Primitives(IPP), Intel(R) Threading Building Blocks(TBB) and Intel(R) C++ Compiler(ICC) on Linux ( Ubuntu 14.04 LTS 64bit ). We will build and run one of the examples that comes with IPP and apply TBB and ICC on the example to observe the performance improvement of using Intel(R) System Studio features.

  • Developers
  • Partners
  • Professors
  • Students
  • Linux*
  • C/C++
  • Advanced
  • Beginner
  • Intermediate
  • Intel® C++ Compiler
  • Intel® VTune™ Amplifier
  • Intel® Integrated Performance Primitives
  • Intel® System Studio
  • ISS
  • IPP
  • tbb
  • icc
  • Cloud Computing
  • Optimization
  • Threading
  • Vectorization
  • Subscribe to Multithread development