How to install and enable Offload Over Fabric, configure the hardware, and test the configuration.
In the previous article, we discussed the performance and accuracy of Binarized Neural Networks (BNN). We also introduced a BNN coded from scratch in the Wolfram Language. The key component of this neural network is Matrix Multiplication.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
This document is designed to help users get started writing code and running MPI applications using the Intel® MPI Library on a development platform that includes the Intel® Xeon Phi™ processor.
Contrast results for manually tuning financial data and using data layout templates in the Intel® C++ Compiler.
Learn how to use Offload over Fabric software for a server migration path.
Code Sample included: Learn how to use MPI-3 shared memory feature using the corresponding APIs on the Intel® Xeon Phi™ processor.
This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.