Finite Differences on Heterogeneous Distributed Systems

Our building block is the FD compute kernels that are typically used for RTM (reverse time migration) algorithms for seismic imaging. The computations performed by the ISO-3DFD (Isotropic 3-dimensional finite difference) stencils play a major role in accurate imaging of complex subsurface structures in oil and gas surveys and exploration. Here we leverage the ISO-3DFD discussed in [1] and [2] and illustrate a simple MPI-based distributed implementation that enables a distributed ISO-3DFD compute kernel to run on a hybrid hardware configuration consisting of host Intel® Xeon® processors and attached Intel® Xeon Phi™ coprocessors. We also explore Intel® software tools that help to analyze the load balance to improve performance and scalability.
  • 开发人员
  • Linux*
  • 服务器
  • seismic
  • RTM
  • stencil
  • 3D finite difference
  • 3DFD
  • distributed
  • Cluster
  • Intel® Xeon® processors
  • Intel® Xeon Phi™ Coprocessors
  • 消息传递接口
  • OpenMP*
  • 集群计算
  • 代码现代化
  • Intel® Many Integrated Core Architecture
  • 优化
  • 并行计算
  • The New Parallel Universe Magazine is Out: All About Vectorization

    Parallel Universe is Intel's quarterly magazine that explores inroads and innovations in software development. The new issue takes a deep dive into the subject of vectorization and what it can do for you. Our first feature article looks at the SIMD directives for explicit vector programming now available in OpenMP. The second article walks you through Vectorization Advisor, a new tool in the latest version of Intel® Advisor XE that can help answer your questions about vectorization.

    Team invalidation between consecutive parallel constructs

    We are doing some experiments with the EPCC parallel benchmark on an Intel Xeon Phi coprocessor 7120 with 244 threads, compact affinity, hierarchical barrier, KMP_LIBRARY=turnaround, KMP_BLOCKTIME=infinit.

    Using VTune, I see that most of the non-waiting time is consumed in the __kmp_hierarchical_barrier_release which makes sense to me. However, inside this function, most of the time is spent in:

    No Cost Options for Intel Integrated Performance Primitives Library (IPP), Support Yourself, Royalty-Free

    Intel® IPP is an extensive library which includes thousands of optimized functions covering frequently used fundamental algorithms including those for creating digital media, enterprise data, embedded, communications, and scientific/technical applications. Intel IPP includes routines for Image Processing, Computer Vision, Data Compression, Signal Processing and (with an optional add-on) Cryptography. Intel IPP is available for Linux*, OS X* and Windows*.

    OpenMP 4.0 target offload Report

    Hi ..

    I am trying to make a comparison statistics of offload using,

    1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct 

    My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)

    Here is the code


    // Start time
            gettimeofday(&start, NULL);

    run-time error

    Hello everyone,



    I am new to this community and, first of all, I would like to thank everyone for the help in advance.

    I use intel fortran for my programs, which have always been coded using a sequential approach. Now I am trying to parallelize a few do loops but I have a run-time problem when I try to run a program with the openmp directives. The thing is that this the first time for me to try to implement using openmp, and I would like to apologize if my questions are stupid.

    订阅 OpenMP*