Learn how to write an MPI program in Python*, and take advantage of Intel® multicore architectures using OpenMP threads and Intel® AVX512 instructions.
In the previous article, we discussed the performance and accuracy of Binarized Neural Networks (BNN). We also introduced a BNN coded from scratch in the Wolfram Language. The key component of this neural network is Matrix Multiplication.
In this tutorial, we demonstrate some possible ways to optimize an application to run on the Intel® Xeon Phi™ processor
Cython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython generates C extension modules, which can be used by the main Python program using the import statement.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
This paper examines software performance optimization for an implementation of a non-library version of DGEMM executing on the Intel® Xeon Phi™ processor (code-named Knights Landing, with acronym K
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.
学习如何在英特尔® 至强融核™ 处理器中使用 MPI-3 共享内存
This document is designed to help users get started writing code and running MPI applications using the Intel® MPI Library on a development platform that includes the Intel® Xeon Phi™ processor.