In-place matrix transposition, a standard operation in linear algebra, is a memory bandwidth-bound operation. The theoretical maximum performance of transposition is the memory copy bandwidth. However, due to non-contiguous memory access in the transposition operation, practical performance is usually lower. The ratio of the transposition rate to the memory copy bandwidth is a measure of the transposition algorithm efficiency.
In Part 8 we integrate the GUI with the back end. We examine implications of mixing managed code with enclaves and how to mitigate the potential for undermining security gained from Intel® SGX.
Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel® Xeon Phi™ coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation.
MILC software represents a set of codes written by the MIMD Lattice Computation collaboration used to study quantum chromodynamics. This article provides instructions for code access, build and run directions for the “ks_imp_rhmc” application on Intel® Xeon® Gold and Intel® Xeon Phi™ processors for better performance on a single node.
The latest version of MXNet includes built-in support for the Intel® Math Kernel Library (Intel® MKL) 2018. The latest version of the Intel MKL includes optimizations for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and AVX-512 instructions which are supported in Intel® Xeon® processor and Intel® Xeon Phi™ processors. The Intel® Math Kernel Library for Deep Neural Networks(Intel® MKL-DNN) is a new open source library designed to accelerate Deep Learning (DL) applications on Intel® architecture. It includes functionality similar to Intel® Math Kernel Library (Intel® MKL) with additional optimizations for Deep Learning workloads.
As the leading framework for Distributed ML, the addition of deep learning to the super-popular Spark framework is important, because it allows Spark developers to perform a wide range of data analysis tasks—including data wrangling, interactive queries, and stream processing—within a single framework. Three important features offered by BigDL are rich deep learning support, High Single Node Xeon Performance, and Efficient scale-out leveraging Spark architecture.
This article completes an analysis of a problem erroneously reported on the Intel® Developer Zone forum: Vectorization failed because of unsigned integer? It provides a more detailed examination showing that unsigned integer is not impacting compiler vectorization but what methodology to use when a modern C/C++ compiler fails to auto-vectorize for-loops.
The following is a quick guide on getting a PhysX* Destructible Mesh (DM) working setup in an Unreal Engine* 4 (UE4*) project. This guide is primarily based on personal trial and error; other methods may exist that work better for your project. See official documentation for tutorials on fracturing and troubleshooting if you would like to go more in depth with Destructive Mesh capabilities.