Improving performance

How to analyze Intel® Xeon Phi™ coprocessor applications using Intel® VTune™ Amplifier XE 2015

 

Introduction

 

Intel® VTune™ Amplifier XE 2015 now includes some new capabilities for analyzing Intel® Xeon Phi™ coprocessor applications. This article will step through this analysis on a Intel® Xeon Phi™ coprocessor and also outline some of the new capabilities.

 

Compiling and running the application

  • Intermediate
  • Useful links for Intel® VTune™ Amplifier for Systems

     

    Intel® VTune™ Amplifier  for Systems is part of the Intel® System Studio, http://software.intel.com/en-us/intel-system-studio, suite of embedded tools.

    Some useful articles and videos on using Intel® VTune™ Amplifier  for Systems:

    Videos

    Remote collection

  • Developers
  • Linux*
  • MeeGo*
  • Moblin*
  • Tizen*
  • Yocto Project
  • C/C++
  • Advanced
  • Beginner
  • Intermediate
  • Intel® System Studio
  • Анализ производительности Java на устройствах Android с помощью Intel® VTune™ Amplifier 2014 for Systems

    Intel® VTune™ Amplifier 2014 for Systems поддерживает анализ функций Java и доступ к ассемблеру с JIT, Java Source и Dex* для функций, обработанных с помощью JIT на рутованных устройствах Android*, на которых запущена виртуальная машина Java/Dalvik* с оснасткой.  Прочтите эту статью позже, чтобы узнать, как запустить будущую версию VTune Amplifier for Systems для включения анализа Java на ART* JVM.

    Если возникают следующие проблемы:

  • Developers
  • Android*
  • Android*
  • C/C++
  • Java*
  • Advanced
  • Intermediate
  • Intel® VTune™ Amplifier
  • Intel® System Studio
  • VTune Amplifier Java Dalvik Android
  • Development Tools
  • Mobility
  • Optimization
  • How to analyze OpenMP* applications using Intel® VTune™ Amplifier XE 2015

     

    Introduction

     

    Intel® VTune™ Amplifier XE 2015 now includes extensive capabilities for analyzing OpenMP applications. This article will step through this analysis on an Intel® Xeon Phi™ coprocessor.

     

    Compiling and running the application

     

    The application we will be using is one of the samples included in VTune Amplifier. It is located in /opt/intel/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz. To build the application on Linux*:

  • Developers
  • Linux*
  • C/C++
  • Fortran
  • Intermediate
  • Intel® Parallel Studio XE
  • Intel® VTune™ Amplifier XE
  • Development Tools
  • Parallel Computing
  • Threading
  • How Intel® AVX2 Improves Performance on Server Applications

    The latest Intel® Xeon® processor E5 v3 family includes a feature called Intel® Advanced Vector Extensions 2 (Intel® AVX2), which can potentially improve application performance related to high performance computing, databases, and video processing. Here we will explain the context, and provide an example of how using Intel® AVX2 improved performance for a commonly known benchmark.

  • Developers
  • Partners
  • Students
  • Linux*
  • Server
  • Intermediate
  • Intel® C++ Compiler
  • AVX2
  • AVX
  • SSE
  • server
  • High Performance Linpack
  • LINPACK Benchmark
  • Linpack
  • Enterprise
  • Parallel Computing
  • Threading
  • Vectorization
  • Improve Intel MKL Performance for Small Problems: The Use of MKL_DIRECT_CALL

    One of the big new features introduced in the Intel MKL 11.2 is the greatly improved performance for small problem sizes. In 11.2, this improvement focuses on xGEMM functions (matrix multiplication). Out of the box, there is already a version-to-version improvement (from Intel MKL 11.1 to Intel MKL 11.2). But on top of it, Intel MKL introduces a new control that can lead to further significant performance boost for small matrices. Users can enable this control when linking with Intel MKL by specifying "-DMKL_DIRECT_CALL" or "-DMKL_DIRECT_CALL_SEQ".

  • Developers
  • Professors
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Unix*
  • Server
  • C/C++
  • Fortran
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • small matrix
  • performance
  • Optimization
  • Significant performance improvement of symmetric eigensolvers and SVD in Intel MKL 11.2

     

    Intel MKL 11.2 contains a number of optimizations for Symmetric Eigensolvers and SVD. These mostly related to large matrices N>4000, 6000, and on but speedups are significant comparing to the previous MKL 11.1.  SVD brings up to 6 times (or even higher on large thread counts and matrix sizes), similarly for eigensolvers, several times could be observed.

    List of related optimizations present in MKL 11.2 are:

  • Intel® Math Kernel Library
  • SVD performance in MKL
  • Using Intel® MPI Library 5.0 with MPICH based applications

    Why it is needed?

    Different MPI implementations have their specific benefits and advantages. So in the specific cluster environment the HPC application with the other MPI implementation can probably perform better.

     Intel® MPI Library has the following benefits:

  • Developers
  • Partners
  • Professors
  • Students
  • Linux*
  • Server
  • Advanced
  • Beginner
  • Intermediate
  • Intel® Cluster Toolkit
  • Intel® Trace Analyzer and Collector
  • Intel® MPI Library
  • Intel® Cluster Studio
  • Intel® Cluster Studio XE
  • Intel® Cluster Ready
  • Message Passing Interface
  • Cluster Computing
  • Development Tools
  • Improving Performance with MPI-3 Non-Blocking Collectives

    The new MPI-3 non-blocking collectives offer potential improvements to application performance.  These gains can be significant for the right application.  But for some applications, you could end up lowering your performance by adding non-blocking collectives.  I'm going to discuss what the non-blocking collectives are and show a kernel which can benefit from using MPI_Iallreduce.

  • Developers
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Server
  • Intermediate
  • Intel® Trace Analyzer and Collector
  • Intel® MPI Library
  • Message Passing Interface
  • mpi-3
  • non-blocking collectives
  • Cluster Computing
  • Optimization
  • Parallel Computing
  • Subscribe to Improving performance