英特尔® 开发人员专区:
性能

精华

新鲜出炉!Intel® Xeon Phi™ Coprocessor High Performance Programming 
学习面向这个新型架构和新产品编程的基本要素。 全新!
英特尔® System Studio
英特尔® System Studio 是一款综合性集成软件开发工具套件解决方案,能够缩短上市时间,增强系统可靠性,并提高能效和性能。 全新!
万一您错过了时间,还可参加为时两天的现场网络研讨会的重播
介绍面向英特尔® 至强™ 处理器和英特尔® 至强融核™ 协处理器的高性能应用程序开发。
Structured Parallel Programming
作者 Michael McCool、Arch D. Robison 和 James Reinders 采用一种基于结构性形式的途径,从而使该课题能为每一位软件开发人员所接受。

在英特尔创新资源的帮助下实现并行编程,为您的客户提供最出色的应用性能。

开发资源


开发工具

 

英特尔® Parallel Studio

英特尔® Parallel Studio 为 Microsoft Visual Studio* C/C++ 开发人员带来了简化的端到端并行性,还可提供先进的工具,帮助他们优化面向多核和众核的客户端应用。

英特尔® 软件开发产品

探索所有可帮助您针对英特尔架构实现优化的工具。某些指定工具支持 45 天免费评估期。

工具知识库

查找关于英特尔工具的指南和支持信息。

开发万亿级片上运算:1/3
作者:kenstrandberg张贴日期:05/10/20110
  “片上万亿级计算环境” “片上万亿级计算开发”系列文章共三篇,旨在审视针对万亿级计算规模(万亿次浮点计算及以上)运行的多核并行电脑的未来编程方向。本文为第一篇。 多年来,英特尔以及整个处理器行业一直都在呼唤并行处理时代的到来。这股呼声日益高涨。目前,英特尔公司正在生产四核处理器。根据摩尔定律,10 年后将出现包含 128 枚内核的处理器产品,这个数字在 12 年和 14 年后将分别变成 256 和 512,到了 2023 年,处理器内核数量将突破一千枚。当然,这只是理论上的预测。现实情况是:英特尔已经研制了一款名为 Polaris 的 80 核试验用处理器,并于去年通过一款并行应用...
高性能计算的线程模型:Pthreads 还是 OpenMP?
作者:binstock张贴日期:05/10/20110
作者:Andrew Binstock 简介 UNIX 操作系统多年来一直支持线程,这是 UNIX 在服务器系统上异常活跃的主要原因之一。在过去几年间,Linux* 一直宣传自己通过改进线程的内核支持而在服务器上的出色表现。例如,kernel 最近发布的 2.6 版增加了全新的调度程序,能够通过可以在 Linux 系统上切换的线程来大幅度优化速度。kernel 之前的版本(2.4 版--Linux kernel 使用偶数代表发布的版本,奇数代表正在开发的版本)同样根据线程能力的大幅度改进来划分。这些进步有助于将 Linux 放在服务器上和放入支持高性能计算(HPC)的站点中。此外,Linu...
常见问题解答:英特尔® 多核处理器架构
作者:admin张贴日期:05/10/20110
  常见问题解答:英特尔® 多核处理器架构 基本概念 多核架构迁移详解 如何从多核架构中获益 多线程编程面临的挑战 英特尔如何提供帮助 其它资源 常见问题解答为什么没有回答我的问题? 这份常见问题解答旨在突出 Intel.com 上公布的多核信息的子集,而不是一份综合指南,并未涵盖英特尔发布的所有与多核相关的资料。如欲了解在其它地方查找答案的方法,请参阅下一篇文章。如欲对这份常见问题解答提出更新建议,请联系支持部门,或者将与软件相关的多核问题直接张贴到在英特尔并行架构上进行线程编程讨论论坛。将问题张贴到英特尔® 开发人员专区论坛,不仅可以让社...
Intel® Threading Challenge 2011: Official Rules
作者:Jeff Kataoka (Intel)张贴日期:05/06/20110
Threading Challenge 2011, Phase 1 (Amended on May 6, 2011) Participation The Threading Challenge 2011 contest series will be implemented in two phases. Phase 1 with two levels of participation begins on April 18, 2011, and ends on June 27, 2011(Contest Date Amended). Phase 2 will be launched in ...
订阅 英特尔开发人员专区文章
我给同事配的实用型的家庭多媒体影院系统 -- 3.音响选择篇:
作者:Lang Lang (Intel) 张贴日期:2008/06/15 13
3.音响选择篇:(前两篇似乎看的人挺多,回帖的人不多,有点打击我的积极性,呵呵,希望大家踊跃帮顶) 这篇应该是最会引起争议的,因为我对推荐的这个选择有些离经叛道。我从近大学开始就玩音响了,最初是自己做,甲类乙类等都做过,箱子最初也是自己做,到处搜罗空箱和喇叭,自己绕分频器。后来开始淘二手的音响,那时所谓的二手基本上都是一些海外过来的旧的淘汰的机器和箱子,从现在看实际上称之为电子垃圾也行,那些东西要懂得才能去挑,还要些运气。运气好的话能够碰到只是电位器或选择器有些杂音,拿瓶电位器润滑剂喷一下就可以了。这些旧货是拿回去摩机的最好选择,哈哈。箱子虽然也有旧的,但都比较小,而国产的那时飞乐等还生...
我给同事配的实用型的家庭多媒体影院系统 -- 2.电脑配置篇
作者:Lang Lang (Intel) 张贴日期:2008/06/10 20
电脑配置篇: 电脑选择方面基本上以DIY为主。本来想配个以Thermaltake Bach为机箱的HTPC,配Intel P35主板,迪兰恒进 3450 HDMI显卡,选配坦旦AUREON 5.1 sky声卡。但是同事嫌这类机箱太大,不够好看,而且全部自己组装比较麻烦。我就推荐了准系统。他喜欢这类小巧的机箱。确汀了机箱类型后,其他的就比较方便了。浩鑫是一直坚持在做准系统的厂家,不断地推陈出新,最近出了一款XPC SG33G5M Deluxe,是专门为HTPC设计的准系统。上网查了一下,确实是好东西,该有的功能都有了,可是价格也是相当的高啊,4500多的价格,都能够买一台中下等的笔电了。以...
我给同事配的实用型的家庭多媒体影院系统 -- 1.电视机选择篇
作者:Lang Lang (Intel) 张贴日期:2008/06/10 14
上周出差北京,一个北京的同事在装修新房子,聊起要在家里配置一套家庭影院系统,要求要实用,安装和使用简单。因为他听说我早在3年前就在家里以计算机为中心配置了家庭影院系统,帮他配置新房的系统就落在我的头上了。作为曾经的Solution consulter, 了解情况和客户喜好是首要的步骤。第一,了解到,他的厅的宽度大约4米,那么这样可以确定多少大小的电视机比较适合。第二他对电脑外形的喜好,经过交流,他不喜欢标准的HTPC,本来我是推荐Thermaltake Bach 之类的机箱,但他觉得不太好看,希望整套系统小一些(当然除了电视机啦) 。这样看来准系统类比较适合他的要求。音响方面,同事对听音...
多核教学论坛--武汉大学
作者:Yang, JianFeng (杨剑锋) 张贴日期:2008/05/06 1
为了加大Intel多核技术相关课程在中国高校的覆盖面,提高教学水平和相关专业本科以上学生在多核技术认知能力及基于多核平台的编程技术水平,Intel针对即将加入Intel多核大学计划的67所高校进行了师资培训和教学研讨。summary-multi-core-training-forum.ppt 由Intel主办、武汉大学承办的的多核教学论坛已于4月10日-17日在武汉大学电子信息学院国家工科电工电子基础课程教学基地顺利举行,并于17日胜利落下帷幕。本次论坛共7天,每天白天8个小时的课程,晚上自由上机实验。共有来自全国包括西安理工大学、厦门大学、郑州大学等22所大学46位教师列席,大部分教师...
订阅 英特尔® 开发人员专区博客
Locking CPU cache lines for a thread ( L1)
作者:Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
作者:Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
作者:Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
作者:Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
作者:Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
作者:VIKRANT G.4
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
Weird Openmp bug
作者:Cheng C.1
Dear all, I want to combine OpenMP and RSA_public_encrypt and RSA_private_decrypt routines. However, I was confused by a weird bug for a few days.    In the attached program, if I generated 2 threads for parallel encryption and decryption, everything works well. If I generated 3 or more threads, the RSA_public_encrypt routine works fine. All strings are successfully encrypted (encrypt_len=256). However, the RSA_private_decrypt routine went wrong, that is, only one thread works properly, all the other threads failed to decrypt some of the strings (decrypt_len=-1, rsa_eay_private_decrypt padding check failed). If there are 1000 strings and 4 threads, the total number of string failed to decrypt went around 710 (some times as low as around 200). So as expected, if I use 4 threads for parallel RSA_public_encrypt and one thread for RSA_private_decrypt, nothing went wrong.   It would be great if you could give some ideas. Thanks very much.    #include <openssl/rsa.h> #include <...
performance loss
作者:Bo W.8
Hi, some interesting performance loss happened with my measurements. I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.  On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw : Program 0 Runtime 7.7s Program 8 Runtime 7.63s And then, i set  cores on the second socket  to 1.2GHz and saw: Program 0 Runtime 12.18s Program 8 Runtime 15.73s The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.   Regards, Bo
订阅 论坛

精华