Filters

Article

OpenMP* and the Intel® IPP Library

How to configure OpenMP in the Intel IPP library to maximize multi-threaded performance of the Intel IPP primitives.
Authored by Last updated on 07/31/2019 - 14:30
Article

Improving Averaging Filter Performance Using Intel® Cilk™ Plus

Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism.  It provides three new keywords to i

Authored by Anoop M. (Intel) Last updated on 12/12/2018 - 18:00
Article

哈工大计算机网络实验一 :多线程服务器编程

思路:linux下包含头文件#include <pthread.h>

编译的时候加上参数 -lpthread

Authored by Last updated on 07/05/2019 - 14:10
Article

Vectorizing Loops with Calls to User-Defined External Functions

Introduction

Authored by Anoop M. (Intel) Last updated on 12/12/2018 - 18:00
Blog post

The switch() statement isn't really evil, right?

In my current position, I work to optimize and parallelize codes that deal with genomic data, e.g., DNA, RNA, proteins, etc.

Authored by Clay B. (Blackbelt) Last updated on 07/04/2019 - 10:46
Article

Fast Gathering-based SpMxV for Linear Feature Extraction

This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
Authored by Last updated on 12/12/2018 - 18:00
Article

Peel the Onion (Optimization Techniques)

This paper is a more formal response to an Intel® Developer Zone forum posting. See: (https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/590710).
Authored by jimdempseyatthecove (Blackbelt) Last updated on 12/12/2018 - 18:00
Article

Fast Computation of Huffman Codes

The generation of Huffman codes is used in many applications, among them the DEFLATE compression algorithm. The classical way to compute these codes uses a heap data structure. This approach is fairly efficient, but traditional software implementations contain lots of branches that are data-dependent and thus hard for general-purpose CPU hardware to predict. On modern processors with deep...
Authored by James Guilford (Intel) Last updated on 07/09/2019 - 16:09
Blog post

Reduce Boilerplate Code in Parallelized Loops with C++11 Lambda Expressions

Parallelize loops with Intel® Threading Building Blocks using Intel® C++ Compiler for lambda expressions.
Authored by gaston-hillar (Blackbelt) Last updated on 12/12/2018 - 18:00
Article

Чистим лук (но не плачем): методики оптимизации

Эта статья представляет собой формализованный ответ на публикацию на форуме Intel® Developer Zone. См.: (https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/590710).
Authored by Last updated on 12/12/2018 - 18:00