英特尔® 开发人员专区:
性能

精华

新鲜出炉!Intel® Xeon Phi™ Coprocessor High Performance Programming 
学习面向这个新型架构和新产品编程的基本要素。 全新!
英特尔® System Studio
英特尔® System Studio 是一款综合性集成软件开发工具套件解决方案,能够缩短上市时间,增强系统可靠性,并提高能效和性能。 全新!
万一您错过了时间,还可参加为时两天的现场网络研讨会的重播
介绍面向英特尔® 至强™ 处理器和英特尔® 至强融核™ 协处理器的高性能应用程序开发。
Structured Parallel Programming
作者 Michael McCool、Arch D. Robison 和 James Reinders 采用一种基于结构性形式的途径,从而使该课题能为每一位软件开发人员所接受。

在英特尔创新资源的帮助下实现并行编程,为您的客户提供最出色的应用性能。

开发资源


开发工具

 

英特尔® Parallel Studio

英特尔® Parallel Studio 为 Microsoft Visual Studio* C/C++ 开发人员带来了简化的端到端并行性,还可提供先进的工具,帮助他们优化面向多核和众核的客户端应用。

英特尔® 软件开发产品

探索所有可帮助您针对英特尔架构实现优化的工具。某些指定工具支持 45 天免费评估期。

工具知识库

查找关于英特尔工具的指南和支持信息。

使用英特尔® 智能存储加速库优化存储解决方案
作者:tianhui s.张贴日期:12/11/20140
随着越来越多的设备连接到云/互联网,出现了各种不同的源(包括智能手机、平板电脑和物联网设备)生成数据。 存储需求逐年攀升。  英特尔® 至强™ 处理器产品家族和英特尔® 智能存储库(英特尔® ISA-L)的结合为可发人员提供了安全、快速处理数据,甚至降低存储空间需求的工具。 在中国,英特尔与奇虎 360 科技有限公司合作将英特尔 ISA-L 集成至其存储解决方案。 这使得其性能提升了 10 倍,存储空间需求降低了 2/3。 阅读案例研究。 英特尔® ISA-L 可帮助加速和优化任何设备(从小型的办公 NAS 设备到企业级存储系统)上基于英特尔® 架构 (IA) 的存储。   本库中提...
如何使用英特尔® Inspector for Systems
作者:tianhui s.张贴日期:12/11/20140
背景 英特尔® System Studio 是一款全新的嵌入式软件工具套件,其中包括 Intel® Inspector for Systems。 本文将介绍如何在嵌入式平台上运行 Inspector for Systems。 概述 我们将以 Yocto Project* 1.2 版为例。 该平台支持多种英特尔主板支持包 (BSP),它还支持通过它们提供的仿真器进行开发,从而无需运行物理嵌入式硬件仅可使用。 以下步骤介绍了如何设置应用,然后通过 Yocto Project* 仿真器 (runqemu) 在该应用上运行英特尔® Inspector for Systems 集合。  以下...
英特尔® 至强融核™ 协处理器(代号 “Knights Landing”)— 应用就绪
作者:tianhui s.张贴日期:12/11/20140
为了将来在英特尔® 至强™ 处理器和英特尔® 至强融核™ 协处理器(代号 Knights Landing)上实现部分应用就绪,开发人员主要希望从两个方面改进工作负载: 矢量化/代码生成 线程并行性 本文主要讨论矢量化/代码生成,并介绍了一些有用的线程并行工具和资源。 1) 矢量化 英特尔® 高级矢量扩展指令集 512 (英特尔® AVX-512)首先在处理器和协处理器上部署,而且未来可以在 Knights Landing 之后推出的英特尔至强处理器上使用。 关于英特尔 AVX-512 的更多详情,请参阅: https://software.intel.com/en-us/bl...
订阅 英特尔开发人员专区文章
基于IA加速动画渲染
作者:BRUCE C. (Intel) 张贴日期:2012/06/12 0
附件是一份基于IA加速动画渲染的案例,供大家参考。基于IA加速动画渲染
基于IA提升应用分析能力
作者:BRUCE C. (Intel) 张贴日期:2012/06/12 0
附件是一份基于IA提升应用分析能力的案例,供大家参考。更快的分析提升竞争实力
基于IA重建企业骨干架构
作者:BRUCE C. (Intel) 张贴日期:2012/06/12 0
附件是一份基于IA重建企业骨干架构的案例,供大家参考。重建企业骨干架构
虚拟化案例
作者:BRUCE C. (Intel) 张贴日期:2012/06/12 0
附件是一份虚拟化提升管理灵活性的案例,供大家参考。虚拟化提升管理灵活性
订阅 英特尔® 开发人员专区博客
Haswell TSX using RTM (beginner student)
作者:tshan k.3
Hello, I am just getting introduced into haswell's TSX infrastructure using RTM. I have downloaded the rtm.h header files from online and i tried producing a simple counter. Unfortunately every time i compile and run the program, the _xbegin function does not execute the transaction inside.  I would be greatly appreciated for your help. thanks #include <stdio.h> #include <stdlib.h> #include "rtm.h" void main(){     int N=5;     int i;     int status;     int counter = 0;     status = _xbegin(); if (status == _XBEGIN_STARTED) {     for (i=0; i<N ; i++)  {         counter++;         printf("counter value: %d\n", counter);     }     _xend(); }      else          printf("did not work\n"); }
Using thread_local on C++ throws error
作者:Rihab A.5
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14.  When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.  
Poor threading performance on Intel Xeon E5-2680 v2
作者:Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
作者:Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Locking CPU cache lines for a thread ( L1)
作者:Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
作者:Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
作者:Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
作者:Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
订阅 论坛

精华