Common Best Known Methods for Parallel Performance, from Intel® Xeon® to Intel® Xeon Phi™ Processors

To improve the performance of applications and kernels we are constantly on the search for novel Best Known Methods or BKMs, but as our searches grow more esoteric, it is important to keep in mind the basics and how many performance improvements rely on them.  This article will describe some common BKMs for improving parallel performance and show their application over this spectrum of processor architectures.  The advice collected here should help you speed up your code, whether running on an Intel® Xeon Phi™ coprocessor or an Intel Xeon process

  • 开发人员
  • 教授
  • 学生
  • C/C++
  • Fortran
  • 中级
  • 英特尔® VTune™ 放大器
  • 英特尔® Parallel Studio XE Composer Edition
  • Best Known Methods
  • parallel performance
  • 消息传递接口
  • OpenMP*
  • 英特尔® 酷睿™ 处理器
  • Intel® Many Integrated Core Architecture
  • 优化
  • 并行计算
  • 线程
  • 矢量化
  • Fun with Intel® Transactional Synchronization Extensions

    By now, many of you have heard of Intel® Transactional Synchronization Extensions (Intel® TSX). If you have not, I encourage you to check out this page ( before you read further. In a nutshell, Intel TSX provides transactional memory support in hardware, making the lives of developers who need to write synchronization codes for concurrent and parallel applications easier.

    Intel® Integrated Performance Primitives - Supported Versions

    The following Intel® IPP versions are currently supported:

    • Intel® IPP 9.0  for Windows*, Linux*, OS X*
    • Intel® IPP 8.2  for Windows*, Linux*, OS X*
    • Intel® IPP 8.1  for Windows*, Linux*, OS X*

    Refer to System requirements document for additional information about Operating System support information -

  • 开发人员
  • 安卓*
  • Apple OS X*
  • Linux*
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Tizen*
  • 物联网
  • Windows*
  • .NET*
  • C#
  • C/C++
  • 入门级
  • 英特尔® Parallel Studio XE
  • 英特尔® System Studio
  • 英特尔® 集成性能原件
  • support
  • version
  • release
  • ipp support
  • Intel® Advanced Vector Extensions
  • OpenMP*
  • 开发工具
  • Introduction to the Intel® Xeon Phi™ Coprocessor

    This tutorial introduces the basic hardware and software architecture of the Intel Xeon Phi coprocessor, describing their general features, and provides a first view of the various programming models that support High Performance Computing on hosts equipped with the coprocessor, including offloading selected code to the coprocessor, Virtual Shared Memory, and parallel programming using OpenMP*, Intel Cilk Plus, and Intel Threading Building Blocks.

  • 开发人员
  • 英特尔® Cilk™ Plus
  • Intel® Threading Building Blocks
  • Intel® Xeon Phi™ Coprocessor
  • OpenMP*
  • Setting number_of_user_threads for Intel® Math Kernel Library FFTW3 wrappers

    Consider the case when you

    • Create a FFTW3 plan and use the plan for sequential DFT computation on each thread in your parallel region
    • Use Intel Math Kernal Library (Intel MKL) FFTW3 wrappers
    • Want the best performance

    Intel MKL FFTW3 wrappers are thread safe by default. However, you should set one additional Intel MKL variable to get the best performance with Intel MKL. Set the number_of_users_threads variable as described below.

    In C:

    #include "fftw3.h"

  • 中级
  • 英特尔® 数学核心函数库
  • OpenMP*
  • FFT
  • FFTW
  • FFTW3
  • fourier transform
  • MKL FFTW3 wrappers
  • number_of_users_threads
  • 优化
  • 并行计算
  • 线程
  • Slides da palestra sobre Computação Paralela no FISL14

    A palestra "Como domar uma fera de 1 TFlop que cabe na palma da sua mão" foi apresentada em 3/7/13, no FISL14, por Luciano Palma - Community Manager da Intel para Servidores e Computação de Alto Desempenho.

    Além de introduzir conceitos de programação paralela e discutir a importãncia da implementação do paralelismo no software, Luciano apresentou o coprocessador Intel Xeon Phi, suas incríveis características técnicas (são até 61 cores gerando até 2 TFlops em precisão simples) e a arquitetura deste avançado coprocessador.

    订阅 OpenMP*