Article

OpenMP* WORKSHARE 现在可与英特尔® Fortran 编译器 15.0 并行

英特尔® Fortran 编译器 15.0 现可为包含阵列分配的 OpenMP WORKSHARE 和 PARALLEL WORKSHARE 结构的指定实例生成多线程代码。  很显然,它们是使用 OpenMP SINGLE 结构进行部署,这表示仅可生成单线程代码。

 

Authored by Kenneth Craft (Intel) Last updated on 07/03/2019 - 20:00
Article

如何利用 Windows* 版英特尔® 线程调节器来分析 Linux* 应用

Instructions for how-to use the "Enabling collector for Linux*" to collect Intel® Thread Profiler data on a Linux* system and view the results with the Intel® Thread Profiler for Windows*, part of the Intel® VTune Performance Analyzer for Windows*.
Authored by Eric W Moore (Intel) Last updated on 05/25/2018 - 15:30
Article

最大限度提升 CPU 上的 TensorFlow* 性能:推理工作负载的注意事项和建议

本文将介绍使用面向 TensorFlow 的英特尔® 优化* 进行 CPU 推理的性能注意事项
Authored by Nathan Greeneltch (Intel) Last updated on 08/09/2019 - 02:02
Article

循环修改增强数据并行性能

When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
Authored by admin Last updated on 07/05/2019 - 14:48
Article

面向英特尔® 至强融核™ 协处理器(和英特尔® 至强® 处理器)架构应用的浮点计算 R2R 再现性

 

问题

如果在相同处理器上针对相同输入数据重新运行相同的程序,得到的结果相同吗?

Authored by Last updated on 03/21/2019 - 12:08
Article

粒度与并行性能

One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
Authored by admin Last updated on 07/05/2019 - 19:53
Article

整理您的数据和代码: 数据和布局 - 第 2 部分

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 07/06/2019 - 16:40
Article

避免线程之间发生堆冲突

避免线程之间发生堆冲突 (PDF 256KB)

摘要

Authored by admin Last updated on 07/05/2019 - 19:59
Article

Приводим данные и код в порядок: данные и разметка, часть 2

In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 07/06/2019 - 16:40
Article

Быстрое сшивание панорамы

 

Download paper as PDF

Authored by Last updated on 05/30/2018 - 07:40