Article

Intel® 64 Architecture Processor Topology Enumeration

Download Code Package: 20160519-cpuid_topo.tar.gz
Autor Última actualización 05/07/2019 - 20:39
Article

Single-Producer/Single-Consumer Queue

Unbounded single-producer/single-consumer queue. Internal non-reducible cache of nodes is used. Dequeue operation is always wait-free. Enqueue operation is wait-free in common case. No atomic RMW operations nor heavy memory fences are used.
Autor Dmitry Vyukov Última actualización 12/12/2018 - 18:00
Article

Improving Averaging Filter Performance Using Intel® Cilk™ Plus

Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism.  It provides three new keywords to i

Autor Anoop M. (Intel) Última actualización 12/12/2018 - 18:00
Mensajes en el blog

Go Parallel 2

Parallel programming with Go language (golang). The blog shows examples of parallel divide-and-conquer decomposition and parallel pipelines.
Autor Dmitry Vyukov Última actualización 04/07/2019 - 10:35
Article

Eight Optimizations for 3-Dimensional Finite Difference (3DFD) Code with an Isotropic (ISO)

This article describes how to implement and optimize a three-dimension isotropic kernel with finite differences to run on the Intel® Xeon® Processor and Intel® Xeon Phi™.
Autor Cédric ANDREOLLI (Intel) Última actualización 06/07/2019 - 16:40
Mensajes en el blog

Introduction to OpenMP* on YouTube*

Tim Mattson (Intel) has authored an extensive series of excellent videos as in introduction to OpenMP*.

Autor Mike P. (Intel) Última actualización 04/07/2019 - 19:51
Article

Caffe* Training on Multi-node Distributed-memory Systems Based on Intel® Xeon® Processor E5 Family

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.
Autor Gennady F. (Blackbelt) Última actualización 05/07/2019 - 14:54
Article

Putting Your Data and Code in Order: Optimization and Memory – Part 1

This series of two articles discusses how data and memory layout affect performance and suggests specific steps to improve software performance. The basic steps shown in these two articles can yield significant performance gains. These two articles are designed at an intermediate level. It is assumed the reader desires to optimize software performance using common C, C++ and Fortran* programming...
Autor David M. Última actualización 12/12/2018 - 18:00
Article

Code Sample: Allocate Memory Efficiently on an Intel® Xeon Phi™ Processor

How to efficiently use Multi-Channel DRAM (MCDRAM) and synchronous dynamic random-access memory.
Autor Mike P. (Intel) Última actualización 06/07/2019 - 16:40
Article

Improve Application Performance on an Intel® Xeon Phi™ Processor

Learn techniques for vectorizing code, adding thread-level parallelism, and enabling memory optimization.
Autor Nguyen, Loc Q (Intel) Última actualización 14/06/2019 - 11:50