Getting started

Tuning the Intel MKL DFT functions performance on Intel® Xeon Phi™ coprocessors

Overview

Intel® Math Kernel Library (Intel® MKL) includes the optimized DFT transform functions on Intel® Xeon Phi™ coprocessors. These functions are carefully vectorized and threaded to take advantage of the hardware features. This article provides some performance tuning tips on running MKL DFT function on Intel Xeon Phi coprocessors.  We will start with some simple example code.

Building the example code

  • Developers
  • Linux*
  • Server
  • C/C++
  • Fortran
  • Beginner
  • Intermediate
  • Intel® Math Kernel Library
  • MIC
  • Xeon Phi
  • DFT
  • FFT
  • performance
  • offload
  • MKL
  • Intel® Streaming SIMD Extensions
  • Intel® Many Integrated Core Architecture
  • Using Intel® C++ Compiler for Embedded System

         The Intel® C++ Compiler, also known as icc, is a high performance compiler which lets you build and optimize your C/C++ applications for the Linux* based operating system. The Intel® C++ compiler provides complete supports for various embedded Linux* system. With multiple features of Intel® C++ compiler, you can easily start to use icc for new project developing, or migrate the existing project from GNU compiler.

  • Developers
  • Linux*
  • Yocto Project
  • C/C++
  • Beginner
  • Intel® C++ Compiler
  • Intel® System Studio
  • embedded c
  • embedded c programming
  • Intel System Studio
  • Development Tools
  • Embedded
  • Intel® System Studio - Multicore Programming with Intel® Cilk™ Plus

    Intel System Studio not only provides a variety of signal processing primitives via Intel® Integrated Performance Primitives (Intel® IPP), and Intel® Math Kernel Library (Intel® MKL), but also allows developing high-performance low-latency custom code (Intel C++ Compiler with Intel Cilk Plus). Since Intel Cilk Plus is built into the compiler, it can be used where it demands an efficient threading runtime in order to extract parallelism. Therefore it's possible to effectively introduce multicore parallelism even without introducing it into each of the important algorithms e.g., by employing a parallel pattern called pipeline. For custom code (e.g., code that's not reused via a library), one can rely (in addition to auto-vectorization) on an extended Array Notation incl. elemental functions (kernels) to explicitly vectorize at a higher level compared to ISA-specific intrinsic functions.
  • Developers
  • Students
  • Linux*
  • Yocto Project
  • C/C++
  • Advanced
  • Beginner
  • Intermediate
  • Intel® C++ Compiler
  • Intel® Cilk™ Plus
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® System Studio
  • embedded c programming
  • Embedded
  • Parallel Computing
  • Power Efficiency
  • Threading
  • Vectorization
  • A Simple Example of Using Task APIs in Your Code

    Intel® VTune™ Amplifier XE 2013 provides a new feature which is a set of Task APIs. These Task APIs help to highlight your interest of Task and its sub-Task(s), in VTune Amplifier XE result.

     

    Task APIs:

    void ITTAPI__itt_task_begin ( const __itt_domain *domain, __itt_id taskid, __itt_id parentid, __itt_string_handle *name)

    void ITTAPI__itt_task_end ( const __itt_domain *domain)

     

  • Developers
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • C/C++
  • Intermediate
  • Intel® VTune™ Amplifier XE
  • VTune Amplifier XE Task API
  • Development Tools
  • Использование рабочих веб-процессов для повышения производительности приложений HTML5-JavaScript* стиля Metro

    В этой статье описывается метод использование рабочих веб-процессов в приложениях HTML5 b JavaScript* стиля Metro. Мы рассказываем, что такое рабочие веб-процессы, каковы их преимущества и способы использования.
  • Developers
  • Intel AppUp® Developers
  • Professors
  • Students
  • Microsoft Windows* 8
  • Windows*
  • HTML5
  • JavaScript*
  • Advanced
  • Beginner
  • Intermediate
  • Metro
  • web workers
  • Microsoft Windows* 8 Style UI
  • Pages

    Subscribe to Getting started