Article

Intel® 64 Architecture Processor Topology Enumeration

Download Code Package: 20160519-cpuid_topo.tar.gz
Criado por Última atualização em 05/07/2019 - 20:39
Article

Single-Producer/Single-Consumer Queue

Unbounded single-producer/single-consumer queue. Internal non-reducible cache of nodes is used. Dequeue operation is always wait-free. Enqueue operation is wait-free in common case. No atomic RMW operations nor heavy memory fences are used.
Criado por Dmitry Vyukov Última atualização em 12/12/2018 - 18:00
Article

Improving Averaging Filter Performance Using Intel® Cilk™ Plus

Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism.  It provides three new keywords to i

Criado por Anoop M. (Intel) Última atualização em 12/12/2018 - 18:00
Mensagem de blog

Go Parallel 2

Parallel programming with Go language (golang). The blog shows examples of parallel divide-and-conquer decomposition and parallel pipelines.
Criado por Dmitry Vyukov Última atualização em 04/07/2019 - 10:35
Mensagem de blog

Introduction to OpenMP* on YouTube*

Tim Mattson (Intel) has authored an extensive series of excellent videos as in introduction to OpenMP*.

Criado por Mike P. (Intel) Última atualização em 04/07/2019 - 19:51
Article

Putting Your Data and Code in Order: Optimization and Memory – Part 1

This series of two articles discusses how data and memory layout affect performance and suggests specific steps to improve software performance. The basic steps shown in these two articles can yield significant performance gains. These two articles are designed at an intermediate level. It is assumed the reader desires to optimize software performance using common C, C++ and Fortran* programming...
Criado por David M. Última atualização em 12/12/2018 - 18:00
Article

Приводим данные и код в порядок: оптимизация и память, часть 1

This series of two articles discusses how data and memory layout affect performance and suggests specific steps to improve software performance. The basic steps shown in these two articles can yield significant performance gains. These two articles are designed at an intermediate level. It is assumed the reader desires to optimize software performance using common C, C++ and Fortran* programming...
Criado por Última atualização em 12/12/2018 - 18:00
Mensagem de blog

Debug Intel® Transactional Synchronization Extensions

If printf or fprintf functions cause transaction aborts, use Intel® Processor Trace as a work-around.
Criado por Roman Dementiev (Intel) Última atualização em 04/07/2019 - 17:00
Article

Improve Vectorization Performance with Intel® AVX-512

See how the new Intel® Advanced Vector Extensions 512CD and the Intel AVX512F subsets (available in the Intel® Xeon Phi processor and in future Intel Xeon processors) lets the compiler automatically generate vector code with no changes to the code.
Criado por Alberto V. (Intel) Última atualização em 08/07/2019 - 19:26
Mensagem de blog

Resetting the lowest n set bits

Already a couple of years ago, the Bit Manipulation Instruction Set 1 (BMI1) introduced the instruction BLSR, which resets the lowest bit that is set.

Criado por Thomas Willhalm (Intel) Última atualização em 12/12/2018 - 18:00