Article

编译及优化运行于Xeon Phi™处理器上的Hogbom Clean基准测试程序

概括

本文介绍了编译、优化及运行Hogbom Clean基准测试程序于Xeon Phi™处理器上的步骤和方法,以及讨论了对代码的改动以使程序在Xeon Phi™处理器上获得更大的性能提升。

Criado por Última atualização em 06/07/2019 - 16:30
Article

OpenMP* WORKSHARE 现在可与英特尔® Fortran 编译器 15.0 并行

英特尔® Fortran 编译器 15.0 现可为包含阵列分配的 OpenMP WORKSHARE 和 PARALLEL WORKSHARE 结构的指定实例生成多线程代码。  很显然,它们是使用 OpenMP SINGLE 结构进行部署,这表示仅可生成单线程代码。

 

Criado por Kenneth Craft (Intel) Última atualização em 03/07/2019 - 20:00
Article

基于英特尔® 至强™ 处理器 E5 产品家族的多节点分布式内存系统上的 Caffe* 培训

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) and one of the most popular community frameworks for image recognition. Caffe is often used as a benchmark together with AlexNet*, a neural network topology for image recognition, and ImageNet*, a database of labeled images.
Criado por Gennady F. (Blackbelt) Última atualização em 05/07/2019 - 14:55
Article

PARSEC* 3.0 中的多线程代码优化: BlackScholes

The Black-Scholes benchmark is a one of the 13 benchmarks in the PARSEC. This benchmark does option pricing with Black-Scholes Partial Differential Equation (PDE). The Black-Scholes equation is a differential equation that describes how, under a certain set of assumptions, the value of an option changes as the price of the underlying asset changes. Based on this formula, one can compute the...
Criado por Artem G. (Intel) Última atualização em 04/07/2019 - 21:42
Article

Приводим данные и код в порядок: данные и разметка, часть 2

In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Criado por David M. Última atualização em 06/07/2019 - 16:40
Article

Чистим лук (но не плачем): методики оптимизации

Эта статья представляет собой формализованный ответ на публикацию на форуме Intel® Developer Zone. См.: (https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/590710).
Criado por Última atualização em 12/12/2018 - 18:00
Article

使用线程化 API 提供的同步例程,而非手工编写同步例程

Application programmers sometimes write hand-coded synchronization routines rather than using constructs provided by a threading API in order to reduce synchronization overhead or provide different functionality than existing constructs offer.
Criado por administrar Última atualização em 05/07/2019 - 20:03
Article

整理您的数据和代码: 数据和布局 - 第 2 部分

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Criado por David M. Última atualização em 06/07/2019 - 16:40
Article

使用任务(而非线程)

Tasks are a lightweight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction.
Criado por administrar Última atualização em 05/07/2019 - 09:51
Article

面向英特尔® 架构优化的 Caffe*:使用现代代码技巧

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Criado por Última atualização em 06/07/2019 - 16:40