提升性能

Анализ производительности Java на устройствах Android с помощью Intel® VTune™ Amplifier 2014 for Systems

Intel® VTune™ Amplifier 2014 for Systems поддерживает анализ функций Java и доступ к ассемблеру с JIT, Java Source и Dex* для функций, обработанных с помощью JIT на рутованных устройствах Android*, на которых запущена виртуальная машина Java/Dalvik* с оснасткой.  Прочтите эту статью позже, чтобы узнать, как запустить будущую версию VTune Amplifier for Systems для включения анализа Java на ART* JVM.

Если возникают следующие проблемы:

  • 开发人员
  • 安卓*
  • 安卓*
  • C/C++
  • Java*
  • 高级
  • 中级
  • 英特尔® System Studio
  • 英特尔® VTune™ 放大器
  • VTune Amplifier Java Dalvik Android
  • 开发工具
  • 移动性
  • 优化
  • How to analyze OpenMP* applications using Intel® VTune™ Amplifier XE 2015

     

    Introduction

     

    Intel® VTune™ Amplifier XE 2015 now includes extensive capabilities for analyzing OpenMP applications. This article will step through this analysis on an Intel® Xeon Phi™ coprocessor.

     

    Compiling and running the application

     

    The application we will be using is one of the samples included in VTune Amplifier. It is located in /opt/intel/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz. To build the application on Linux*:

  • 开发人员
  • Linux*
  • C/C++
  • Fortran
  • 中级
  • 英特尔® Parallel Studio XE
  • 英特尔® VTune™ 放大器 XE
  • 开发工具
  • 并行计算
  • 线程
  • How Intel® AVX2 Improves Performance on Server Applications

    The latest Intel® Xeon® processor E5 v3 family includes a feature called Intel® Advanced Vector Extensions 2 (Intel® AVX2), which can potentially improve application performance related to high performance computing, databases, and video processing. Here we will explain the context, and provide an example of how using Intel® AVX2 improved performance for a commonly known benchmark.

  • 开发人员
  • 合作伙伴
  • 学生
  • Linux*
  • 服务器
  • 中级
  • 英特尔® C++ 编译器
  • AVX2
  • AVX
  • SSE
  • server
  • High Performance Linpack
  • LINPACK Benchmark
  • Linpack
  • 大型企业
  • 并行计算
  • 线程
  • 矢量化
  • Improve Intel MKL Performance for Small Problems: The Use of MKL_DIRECT_CALL

    One of the big new features introduced in the Intel MKL 11.2 is the greatly improved performance for small problem sizes. In 11.2, this improvement focuses on xGEMM functions (matrix multiplication). Out of the box, there is already a version-to-version improvement (from Intel MKL 11.1 to Intel MKL 11.2). But on top of it, Intel MKL introduces a new control that can lead to further significant performance boost for small matrices. Users can enable this control when linking with Intel MKL by specifying "-DMKL_DIRECT_CALL" or "-DMKL_DIRECT_CALL_SEQ".

  • 开发人员
  • 教授
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Unix*
  • 服务器
  • C/C++
  • Fortran
  • 高级
  • 入门级
  • 中级
  • 英特尔® 数学核心函数库
  • small matrix
  • performance
  • 优化
  • Significant performance improvement of symmetric eigensolvers and SVD in Intel MKL 11.2

     

    Intel MKL 11.2 contains a number of optimizations for Symmetric Eigensolvers and SVD. These mostly related to large matrices N>4000, 6000, and on but speedups are significant comparing to the previous MKL 11.1.  SVD brings up to 6 times (or even higher on large thread counts and matrix sizes), similarly for eigensolvers, several times could be observed.

    List of related optimizations present in MKL 11.2 are:

  • 英特尔® 数学核心函数库
  • SVD performance in MKL
  • Using Intel® MPI Library 5.0 with MPICH based applications

    Why it is needed?

    Different MPI implementations have their specific benefits and advantages. So in the specific cluster environment the HPC application with the other MPI implementation can probably perform better.

     Intel® MPI Library has the following benefits:

  • 开发人员
  • 合作伙伴
  • 教授
  • 学生
  • Linux*
  • 服务器
  • 高级
  • 入门级
  • 中级
  • Intel® Cluster Toolkit
  • 英特尔® 跟踪分析器和跟踪采集器
  • Intel® MPI Library
  • Intel® Cluster Studio
  • Intel® Cluster Studio XE
  • Intel® Cluster Ready
  • 消息传递接口
  • 集群计算
  • 开发工具
  • Improving Performance with MPI-3 Non-Blocking Collectives

    The new MPI-3 non-blocking collectives offer potential improvements to application performance.  These gains can be significant for the right application.  But for some applications, you could end up lowering your performance by adding non-blocking collectives.  I'm going to discuss what the non-blocking collectives are and show a kernel which can benefit from using MPI_Iallreduce.

  • 开发人员
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • 服务器
  • 中级
  • 英特尔® 跟踪分析器和跟踪采集器
  • Intel® MPI Library
  • 消息传递接口
  • mpi-3
  • non-blocking collectives
  • 集群计算
  • 优化
  • 并行计算
  • Using the New Intel® Trace Analyzer Summary Page

    One of the new features in Intel® Trace Analyzer and Collector 9.0 is the new Intel® Trace Analyzer Summary Page.  This gives a high-level overview of your program’s time spent in MPI vs. time spent outside of MPI (User Code).

  • 开发人员
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • 服务器
  • 中级
  • 英特尔® 跟踪分析器和跟踪采集器
  • 消息传递接口
  • Intel®Trace Analyzer and Collector
  • Summary Page
  • imbalance
  • Cluster MPI Application Performance
  • 集群计算
  • 开发工具
  • 订阅 提升性能