This article describes how to implement and optimize a three-dimension isotropic kernel with finite differences to run on the Intel® Xeon® Processor and Intel® Xeon Phi™.
How to efficiently use Multi-Channel DRAM (MCDRAM) and synchronous dynamic random-access memory.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
This is an exercise in performance optimization on heterogeneous Intel architecture systems based on multi-core processors and manycore (MIC) coprocessors.
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.
To efficiently utilize all available resources for the task concurrency application on heterogeneous platforms, designers need to understand the memory architecture, the thread utilization on each platform, the pipeline to offload the workload to different platforms. To relieve designers of the burden of implementing the necessary infrastructures, the Heterogeneous Streaming (hStreams) library...
See how the new Intel® Advanced Vector Extensions 512CD and the Intel AVX512F subsets (available in the Intel® Xeon Phi processor and in future Intel Xeon processors) lets the compiler automatically generate vector code with no changes to the code.
In this tutorial, we demonstrate some possible ways to optimize an application to run on the Intel® Xeon Phi™ processor
Cython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython generates C extension modules, which can be used by the main Python program using the import statement.
Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.