Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Resource Guide for People Investigating the Intel® Xeon Phi™ Coprocessor
By Taylor Kidd (Intel)Posted 03/25/20140
This article identifies resources for anyone investigating the value to their organization of the Intel® Xeon Phi™ coprocessor, which is based on the Intel® Many Integrated Core (Intel® MIC) architecture. It is one of three such guides, each for people in one of the following specific roles: Adm...
Resource Guide for Intel® Xeon Phi™ Coprocessor Administrators
By Taylor Kidd (Intel)Posted 03/25/20140
This article makes recommendations for how an administrator can get up to speed quickly on the Intel® Many Integrated Core (Intel® MIC) Architecture. This article is 1 of 3: For the Administrator, for the Developer, and for the Investigator. Someone who will administer and support a set of machi...
Resource Guide for Intel® Xeon Phi™ Coprocessor Developers
By Taylor Kidd (Intel)Posted 03/25/20143
This article makes recommendations for how a developer can get up to speed quickly on the Intel® Many Integrated Core (Intel® MIC) Architecture. This is one of three articles: For the Administrator, for the Developer, and for the Investigator. Who is a Developer? Someone who will be programming ...
Flow Graph Designer
By Michael Voss (Intel)Posted 03/07/20140
What If Home | Product Overview | System Requirements | Useful Links | Development Team | Discussion Forum This download is available under the What If Pre-Release License Agreement   Product Overview Flow Graph Designer is a visualization tool that supports the analysis and...

Pages

Subscribe to
Transactional Memory Support: the speculative_spin_rw_mutex (Community Preview Feature)
By Christopher Huson (Intel)Posted 03/07/20140
In a previous post I discussed the Intel® Transactional Synchronization Extensions (Intel® TSX) technology released in the new generation of processors.  I described the Intel® Threading Building Blocks (Intel® TBB) implementation of the HLE interface (speculative_spin_mutex).  Now we can talk ab...
Intel® Xeon Phi™ coprocessor Power Management Turbo Part 3: How can I design my program to make use of turbo?
By Taylor Kidd (Intel)Posted 02/20/20141
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. See [L...
Why has CPU frequency ceased to grow?
By victoria-zhislina (Intel)Posted 02/19/20140
All of you probably recall the rapid rate of CPU frequency advancement at the end of the last century and beginning of this one.  Tens of megahertz rapidly transformed into hundreds, and then hundreds of megahertz quickly became a full gigahertz, then a gigahertz and a bit, finally two gigs and ...
Intel® Xeon Phi™ coprocessor Power Management Configuration: Using the micsmc command-line Interface
By Taylor Kidd (Intel)Posted 01/31/20140
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. INTRO...

Pages

Subscribe to Intel Developer Zone Blogs
performance loss
By Bo W.8
Hi, some interesting performance loss happened with my measurements. I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.  On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw : Program 0 Runtime 7.7s Program 8 Runtime 7.63s And then, i set  cores on the second socket  to 1.2GHz and saw: Program 0 Runtime 12.18s Program 8 Runtime 15.73s The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.   Regards, Bo
multicore simulation software
By shoog a.10
Hi I have a scheduling code and I want to try it on a different number of core (1,2,3,5,10) to see what is the effect of increasing the number of cores on the program. So, I would like to ask: Is there any tool or simulator software helping me to do this? note: I have Intel(R) core(TM) i7 , windows 8 Thank you    
Threads migrate across all available OS procs
By vijaymohan K.1
Hi, Recently i am facing the problem while i run my fortran code as shown below;(I am running the code in Ubuntu 12.04  with parallel studio xe 2013 update4 intel64 from windows 7 using Virtual Machine Player) OMP: Warning #122: Threads may migrate across all available OS procs (granularity setting too coarse). Added the following to .bashrc file: PATH=$PATH:/home/vijay/intel/vtune_amplifier_xe_2013/bin64:/home/vijay/intel/inspector_xe_2013/bin64:/home/vijay/bin/gmsh-2.5.0-Linux/bin:. source /home/vijay/intel/bin/compilervars.sh intel64 export MALLOC_TRIM_THRESHOLD_=-1 export MALLOC_MMAP_MAX_=0 export NCPUS=8 export OMP_NUM_THREADS=2 export MP_BIND=yes export KMP_STACKSIZE=16m export OMP_DYNAMIC=.TRUE. export WSMP_NUM_THREADS=2 export KMP_AFFINITY=granularity=core,compact,1,0 can anyone help me how to solve this problem...  
Effeciently parallelizing the code in fortran
By prodigyaj@gmail.com13
Hi, I am trying to parallelize a certain section of my code which is written in fortran. The code snippet looks as below: do i=1,array(K) j = K ... if conditions on K... ....write and reads on j... ... do lot of things ... K = K+1 So I tried to parallelize using the below code.. which was obviously not as it should have been !$OMP PARALLEL DO PRIVATE(j) do i=1,50 j = K ... if conditions on K... ....write and reads on j... ... do lot of things ... K = K+1 !$OMP END PARALLEL DO   The obvious mistake being, all the threads race to the same K. What would be the best way to ensure every thread gets assigned an incremental K and the threads run in parallel. Thanks Ajay
Shared memory on Xeon
By Madhav A.1
Hi,       Here is an observation I have. Can you help me explain it.      Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.      Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)      What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it? Thanks, Madhav.
TSX PEBS profiling is not accurate enough
By le g.0
Hi everyone,     I am recently working on a project that takes advantage of the intel TSX extension (RTM). Overall,  the program works well except that there are occasions that there are some unexpected abortions due to memory conflict. The result is obtained both with the returned EAX register and PEBS profiling.    At first, we suspect that the conflicts are induced from our programming mistakes. We have studied the Intel manual again and again, but could not find any clue. However, when we only run a single thread, conflict still happens.  We now have to suspect PEBS for TSX is somehow not accurate enough.    Do anyone know the details? If possible, how can I report this issue to the Intel engineer.    The target CPU in our project is Intel core i7 4770S.    Thanks all!
Tsx conflicts with rdrand
By le g.0
Hi all,    I found that rdrand instruction always causes RTM to abort, which is not documented in the manual. Any one has ever experienced the same situation? Thanks all.
An M/M/n queuing model simulation
By aminer100
  Hello, An M/M/n queuing model simulation with Object Pascal and my Thread Pool Engine - version 1.02 You can download it from: http://pages.videotron.com/aminer/ Read more bellow... Author: Amine Moulay Ramdane Description: It's harder and sometimes impossible to get analytical results about waiting times and queue length for general interarrival and service distributions; so, it's important to be able to estimate these quantities by observing the results of simulation. It's very easy in Object Pascal to simulate a sequence of arrival times with a given interarrival distribution. Look at the examples MM1.pas( M/M/1 queuing model) and MMn.pas(M/M/n - n: number of servers -) inside the zip file: --------------------------- InterArrivals:=TExponentialDistribution.Create(420623,1.0/3.0); ServiceTimes:=TExponentialDistribution.Create(220623,1.0/4.0); currtime:=0.0; for i:=1 to simnumber do begin obj:=TJob.create; obj.simnumber:=simnumber; obj.number:=i; currtime:=currtime+InterArrivals...

Pages

Subscribe to Forums

Highlights