To efficiently utilize all available resources for the task concurrency application on heterogeneous platforms, designers need to understand the memory architecture, the thread utilization on each platform, the pipeline to offload the workload to different platforms. To relieve designers of the burden of implementing the necessary infrastructures, the Heterogeneous Streaming (hStreams) library provides a set of well-defined APIs to support a task-based parallelism model on heterogeneous platforms
The following archive contains *.c files for the code examples referenced in the several sections of the Intel® Integrated Performance Primitives (Intel® IPP) for Microcontrollers Reference Manual.
Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel® Xeon Phi™ coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation.
In-place matrix transposition, a standard operation in linear algebra, is a memory bandwidth-bound operation. The theoretical maximum performance of transposition is the memory copy bandwidth. However, due to non-contiguous memory access in the transposition operation, practical performance is usually lower. The ratio of the transposition rate to the memory copy bandwidth is a measure of the transposition algorithm efficiency.
This article discusses why using a texture rather than an image can improve OpenGL rendering performance. It is accompanied by a simple C++ application that alternates between using a texture and using an image. The purpose of this application is to show the effect on rendering performance (milliseconds per frame) when using the two techniques.
Follow Pawel L. to learn about Intel's graphic driver support for the emerging Vulkan* graphics API. He'll be providing several tutorials along with Github source code.