OpenMP*

编译及优化运行于Xeon Phi™处理器上的Hogbom Clean基准测试程序

概括

本文介绍了编译、优化及运行Hogbom Clean基准测试程序于Xeon Phi™处理器上的步骤和方法,以及讨论了对代码的改动以使程序在Xeon Phi™处理器上获得更大的性能提升。

介绍

Hogbom Clean是“澳大利亚平方公里阵列探路者射电望远镜(ASKAP)”工程基准测试包的一部分,ASKAP基准测试包是一个用于测试众多平台性能的多算法代码包,Hogbom Clean (tHogbomClean)基准测试程序完成了Hogbom Clean解卷积算法的核心代码处理部分。

  • 开发人员
  • 服务器
  • C/C++
  • 英特尔® C++ 编译器
  • OpenMP*
  • 大型企业
  • stack overflow on 64bit server but not on 32bit notebook

    One of my recent tasks is to parallelize a fairly large program in Fortran 90. The subroutine I am targeting spends a great time doing a computationally extensive loop. I have tried the following:
    !$OMP PARALLEL DO DEFAULT(PRIVATE) SHARED (array2D_1, array2D_2, .., scalar1, scalar2,.., scalarN)
    <the calculations >
    !$OMP END PARALLEL DO

    A Parallel Stable Sort Using C++11 for TBB, Cilk Plus, and OpenMP

    This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics. It also points out a scalability trap to watch out for with C++. The attached code has implementations in Intel® Threading Building Blocks (Intel® TBB), Intel® Cilk™ Plus, and OpenMP*.

  • 开发人员
  • 教授
  • 学生
  • C/C++
  • 中级
  • 英特尔® Cilk™ Plus
  • Intel® Threading Building Blocks
  • parallel
  • Merge Sort
  • Cilk Plus
  • tbb
  • openmp
  • OpenMP*
  • 并行计算
  • Recipe: Building and Optimizing the Hogbom Clean Benchmark for Intel® Xeon Phi™ Coprocessors

    Overview

    This article provides a recipe for compiling and running the Hogbom Clean benchmark for the Intel® Xeon Phi™ coprocessor and discusses the various optimizations applied to the code. 

  • 开发人员
  • 教授
  • 学生
  • Linux*
  • 服务器
  • C/C++
  • 高级
  • 中级
  • 英特尔® C++ 编译器
  • OpenMP*
  • Intel® Many Integrated Core Architecture
  • 优化
  • The Chronicles of Phi - part 5 - Plesiochronous phasing barrier – tiled_HT3

    For the next optimization, I knew what I wanted to do; I just didn’t know what to call it. In looking for words that describes loosely-synchronous, I came across plesiochronous:

    In telecommunications, a plesiochronous system is one where different parts of the system are almost, but not quite, perfectly synchronized.

    The Chronicles of Phi - part 4 - Hyper-Thread Phalanx – tiled_HT2

    The prior part (3) of this blog showed the effects of the first-level implementation of the Hyper-Thread Phalanx. The change in programming yielded 9.7% improvement in performance for the small model, and little to no improvement in the large model. This left part 3 of this blog with the questions:

    What is non-optimal about this strategy?
    And: What can be improved?

    There are two things, one is obvious, and the other is not so obvious.

    Data alignment

    Building on Mac OS X 10.9

    I'm trying to build the openmprtl on Mac OS X 10.9, to be used with OpenMP/Clang project. Is this supposed to be possible? A new thing with 10.9 is that gcc is just a macro for clang, which maybe is confusing the build scripts.

    I try to build with:

    make compiler=clang

    And I get a build error in check-tools.pl "Cannot parse GNU compiler version" as it has run gcc and get clang output (as gcc is just a macro for clang on 10.9). I was thinking that when you compile with "compiler=clang" the check-tools.pl would not look for gcc at all.

    How do I know in which core my thread is running

    Hello guys.

    I'm trying to scale a for loop but I'm getting even worse results.

    My serial code runs in 30s but my openmp version completed in 200s.

    This is my pragma.

    int procs = omp_get_num_procs();
    #pragma omp parallel for num_threads(procs)\
    shared (c, u, v, w, k, j, i, nx, ny) \
    reduction(+: a, b, c, d, e, f, g, h, i)

    And this are my openmp exports :

    export OMP_NUM_THREADS=5
    export KMP_AFFINITY=verbose,scatter 

    And this is my verbose running in 1 node 8 cores

    The Chronicles of Phi - part 3 Hyper-Thread Phalanx – tiled_HT1 continued

    The prior part (2) of this blog provided a header and set of function that can be used to determine the logical core and logical Hyper-Thread number within the core. This determination is to be use in an optimization strategy called the Hyper-Thread Phalanx.

    订阅 OpenMP*