Download Code Package: 20160519-cpuid_topo.tar.gz
Unbounded single-producer/single-consumer queue. Internal non-reducible cache of nodes is used. Dequeue operation is always wait-free. Enqueue operation is wait-free in common case. No atomic RMW operations nor heavy memory fences are used.
Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism. It provides three new keywords to i
This article describes how to implement and optimize a three-dimension isotropic kernel with finite differences to run on the Intel® Xeon® Processor and Intel® Xeon Phi™.
Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.
This series of two articles discusses how data and memory layout affect performance and suggests specific steps to improve software performance. The basic steps shown in these two articles can yield significant performance gains. These two articles are designed at an intermediate level. It is assumed the reader desires to optimize software performance using common C, C++ and Fortran* programming...
How to efficiently use Multi-Channel DRAM (MCDRAM) and synchronous dynamic random-access memory.
Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
This is an exercise in performance optimization on heterogeneous Intel architecture systems based on multi-core processors and manycore (MIC) coprocessors.
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.