While Image convolution is not as effective with the new Read-Write images functionality, any image processing technique that needs be done in place may benefit from the Read-Write images. One example of a process that could be used effectively is image composition. In OpenCL 1.2 and earlier, images were qualified with the “__read_only” and __write_only” qualifiers. In the OpenCL 2.0, images can...
Financial service customers need to improve financial algorithmic performance for models such as Monte Carlo, Black-Scholes, and others. SIMD programming can speed up these workloads. In this paper, we perform data layout optimizations using two approaches on a Black-Scholes workload for European options valuation from the open source Quantlib library.
In this paper, we walk through a 3D Animation algorithm example and describe some techniques and methodologies that may benefit your next vectorization endeavors. We also integrate the algorithm with SIMD Data Layout Templates (SDLT), which is a feature of Intel® C++ Compiler, to improve data layout and SIMD efficiency. Includes code sample.
This article provides an introduction to autonomous navigation and its use in augmented reality applications, with a focus on agents that move and navigate. Autonomous agents are entities that act independently using artificial intelligence, which defines the operational parameters and rules by which the agent must abide. The agent responds dynamically in real time to its environment, so even a...
This article shows you how you can use LibRealSense and OpenCV to stream RGB and depth data. In the end you will have a nice starting point where you use this code base to build upon to create your own LibRealSense / OpenCV applications.
如何面向英特尔® 架构优化 Caffe*，训练深度网络模型及部署网络。
This article completes an analysis of a problem erroneously reported on the Intel® Developer Zone forum: Vectorization failed because of unsigned integer? It provides a more detailed examination showing that unsigned integer is not impacting compiler vectorization but what methodology to use when a modern C/C++ compiler fails to auto-vectorize for-loops.
如今，多核处理器已经在 PC 中普及，内核数量不断增长，软件工程师必须适应这种情况。通过学习如何处理潜在的性能瓶颈和并发性问题，工程师可以使他们的代码适应未来，以无缝处理添加到消费者系统的额外内核。
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.