Advanced

Improve Intel MKL Performance for Small Problems: The Use of MKL_DIRECT_CALL

One of the big new features introduced in the Intel MKL 11.2 is the greatly improved performance for small problem sizes. In 11.2, this improvement focuses on xGEMM functions (matrix multiplication). Out of the box, there is already a version-to-version improvement (from Intel MKL 11.1 to Intel MKL 11.2). But on top of it, Intel MKL introduces a new control that can lead to further significant performance boost for small matrices. Users can enable this control when linking with Intel MKL by specifying "-DMKL_DIRECT_CALL" or "-DMKL_DIRECT_CALL_SEQ".

  • Developers
  • Professors
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Unix*
  • Server
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • small matrix
  • performance
  • Optimization
  • Is __kmp_suspend_initialize_thread race free?

    Hi,

    I work with just another tool for data race detection. I repeatedly got messages about a data race between the pthread_mutex_init in __kmp_suspend_initialize_thread and the pthread_mutex_lock. After analysis of the situation, I think there is actually a possible race condition.

    The only synchronization for the init_mutex I could find was the

    if(th->th.th_suspend_init_count > __kmp_fork_count){...}

    surrounding it. I think the init_mutex should be guarded like:

    What’s New in the Intel Compiler

    The list below summarizes new features in the Intel® C++ Compiler 15.0 and the Intel® Fortran Compiler 15.0. For more details about changes in the Intel compilers since the previous release, including a list of new options, please refer to the ‘What’s New’ section in the release notes (C++, Fortran).

  • Developers
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® C++ Compiler
  • Intel® Fortran Compiler
  • OpenMP*
  • Development Tools
  • Intel® Core™ Processors
  • Intel® Many Integrated Core Architecture
  • Optimization
  • Parallel Computing
  • Threading
  • Vectorization
  • Using Intel® MPI Library 5.0 with MPICH based applications

    Why it is needed?

    Different MPI implementations have their specific benefits and advantages. So in the specific cluster environment the HPC application with the other MPI implementation can probably perform better.

     Intel® MPI Library has the following benefits:

  • Developers
  • Partners
  • Professors
  • Students
  • Linux*
  • Server
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Cluster Toolkit
  • Intel® Trace Analyzer and Collector
  • Intel® MPI Library
  • Intel® Cluster Studio
  • Intel® Cluster Studio XE
  • Intel® Cluster Ready
  • Message Passing Interface
  • Cluster Computing
  • Development Tools
  • Mac OS 10.9 - clang: error: unknown argument: '-no-intel-extensions'

    I executed the following commands:

    INTEL_OPENMP_LATEST_BUILD_LINK=https://www.openmprtl.org/sites/default/files/libomp_20131209_oss.tgz

    curl ${INTEL_OPENMP_LATEST_BUILD_LINK} -o libomp_oss_temp.tgz
    gunzip -c libomp_oss_temp.tgz | tar xopf -
    rm -rf libomp_oss_temp.tgz
    cd libomp_oss

     

    OpenMP* WORKSHARE constructs now parallelize with Intel® Fortran Compiler 15.0

    The Intel® Fortran Compiler 15.0 now generates multi-threaded code for select instances of the OpenMP WORKSHARE and PARALLEL WORKSHARE constructs involving array assignments.  Previously, these were implemented with the OpenMP SINGLE construct, meaning that only single-threaded code was generated.

     

    Multithreaded code is not always generated for the statements inside the block of an OMP WORKSHARE construct. Some statements parallelize; others do not parallelize and instead execute sequentially inside an OMP SINGLE construct to preserve the correct semantics of WORKSHARE.

  • Developers
  • Professors
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Server
  • Fortran
  • Advanced
  • OpenMP*
  • 新版英特尔®XDK为您带来了哪些新特性?

    现在开发者可以简便地整合第三方服务的API来使应用变现或整合后端服务来创建内容更加丰富多彩的应用。英特尔®XDK使开发者能够便捷地在应用中加入上百种开源的第三方Cordova*插件,以及Android*, iOS*, Window 8*平台上的各种专有插件。


    新建项目

    一体式工作流,让你的App从创意迸发到打包发布一气呵成:

    • 多合一的解决方案

    • 使用多种方式开始创建你的App

  • Developers
  • Intel AppUp® Developers
  • Partners
  • Professors
  • Students
  • Android*
  • Apple iOS*
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8
  • Android*
  • HTML5
  • Internet of Things
  • UX
  • Windows*
  • HTML5
  • JavaScript*
  • Advanced
  • Beginner
  • Intermediate
  • Intel® XDK
  • html5
  • Development Tools
  • User Experience and Design
  • Subscribe to Advanced