Article

在英特尔® 集成众核 (英特尔® MIC) 架构上使用 OpenMP* 的最佳设计方案

本文是对“面向 Linux* 的英特尔® Composer XE”文档的补充。 本文对在为英特尔集成众核 (Intel MIC) 架构编写卸载和本地程序时使用 C/C++ 和 Fortran 的 OpenMP* 扩展的最佳方法进行了概括。
Authored by Last updated on 03/21/2019 - 12:00
Article

OpenMP* WORKSHARE 现在可与英特尔® Fortran 编译器 15.0 并行

英特尔® Fortran 编译器 15.0 现可为包含阵列分配的 OpenMP WORKSHARE 和 PARALLEL WORKSHARE 结构的指定实例生成多线程代码。  很显然,它们是使用 OpenMP SINGLE 结构进行部署,这表示仅可生成单线程代码。

 

Authored by Kenneth Craft (Intel) Last updated on 07/03/2019 - 20:00
Article

应用蚁群优化算法 (ACO) 实施交通网络扩展

In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
Authored by Sunny G. (Intel) Last updated on 07/05/2019 - 19:13
Article

最大限度提升 CPU 上的 TensorFlow* 性能:推理工作负载的注意事项和建议

本文将介绍使用面向 TensorFlow 的英特尔® 优化* 进行 CPU 推理的性能注意事项
Authored by Nathan Greeneltch (Intel) Last updated on 08/09/2019 - 02:02
Article

面向英特尔® 架构优化的 Caffe*:使用现代代码技巧

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Authored by Last updated on 07/06/2019 - 16:40
Article

面向英特尔® 至强融核™ 处理器(代号“Knights Landing”)的开发人员访问计划

Intel is bringing to market, in anticipation of general availability of the Intel® Xeon Phi™ Processor (codenamed Knights Landing), the Developer Access Program (DAP). DAP is an early access program for developers worldwide to purchase an Intel Xeon Phi Processor based system.
Authored by Mike P. (Intel) Last updated on 03/21/2019 - 12:00
Article

循环修改增强数据并行性能

When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
Authored by admin Last updated on 07/05/2019 - 14:48
Article

粒度与并行性能

One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
Authored by admin Last updated on 07/05/2019 - 19:53
Article

了解面向三维同性有限差分 (3DFD) 波动方程代码的 NUMA

本文将介绍一些技巧,帮助软件开发人员识别并修复使用最新英特尔软件开发工具时遇到的与 NUMA 相关的应用性能问题。

Authored by Sunny G. (Intel) Last updated on 07/05/2019 - 20:13
Article

解读Intel编译器的offload报告

英特尔编译器在对代码进行编译优化的过程中用户可以通过使用”-opt-report-phase=phase”选项让编译器输出某些特定优化阶段的相关信息。针对至强融核™ 协处理器提供的offload编译模式英特尔编译器提供了”offload”关键字。

Authored by Duan, Xiaoping (Intel) Last updated on 06/07/2017 - 10:36