Intel® Developer Zone:


Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources

Development Tools


Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

DB2® 9.5 pureXMLT* Performance Trends on the Next Generation Quad-Core Intel® Xeon® Processor
By Rekha Raghu (Intel)Posted 02/27/20090
Download PDF Download DB2® 9.5 pureXML™* Performance Trends on the Next Generation Quad-Core Intel® Xeon® Processor [PDF 224KB] Abstract This paper will showcase the scalability and performance of IBM* DB2 9.5 on the newest Quad-Core Intel® Xeon® processor, compared to IBM DB2 9 on Dual-Core In...
Intel® Threading Tools and OpenMP*
By Clay Breshears (Intel)Posted 02/25/20092
Introduction Find where parallelism can be implemented effectively within a serial application.Explicit threading methods, such as Windows* threads or POSIX* threads, use library calls to create, manage, and synchronize threads. Use of explicit threads requires an almost complete restructuring of...
New Books Added to Recommend Reading List
By aaron-tersteeg (Intel)Posted 02/24/20090
Principles of Parallel Programming, by Calvin Lin and Larry Snyder. The book emphasizes the principles underlying parallel computation, explains the various phenomena, and clarifies why these phenomena represent opportunities or barriers to successful parallel programming. Ideal for an advanced ...
March 3 @ 8:00am PST - listener questions on Parallel Programming Talk
By aaron-tersteeg (Intel)Posted 02/24/20090
Clay and Aaron will be reading and answering listener questions on March 3rd Parallel Programming Talk at 8:00AM PST. Send in your questions to
Subscribe to Intel Developer Zone Articles
Microsoft TechEd 2009 Kicks Off Tomorrow
By Posted on 05/10/09 8
Microsoft TechEd 2009 will kick off tomorrow morning at the Los Angeles Convention Center and if you're unable to attend you can follow the announcements online at Microsoft TechEd Online. You'll be able to watch the keynote address beginning at 10:00am PST on Microsoft TechEd Online and I'll als...
Living in the future thanks to the Sponsors of Tomorrow
By aaron-tersteeg (Intel) Posted on 05/06/09 3
I like to poke a little fun at my peer Josh Bancroft. When were meeting new people I tend to introduce him as "living in the future", but it's true and not limited to just Josh. Josh and all of the community managers and advisor in the Intel Software Network live in the future. And it's only fitt...
Parallel Programming Talk - Listener Question: Radix Sort Solution
By aaron-tersteeg (Intel) Posted on 05/06/09 0
Welcome to Episode 29 of Parallel Programming Talk broadcast on May 5, 2009 hosted by Aaron Tersteeg and Dr. Clay Breshears. The first show of every month is the listener question show. On this episode we discussed Radix Sort, the first problem from Threading Challenge 2009.Download the MP3 of th...
Another Sorts of Sorts
By Dmitry Vyukov Posted on 05/06/09 6
Asaf Shelly posted interesting blog regarding first problem (radix sort) of the Intel Threading Contest 2009:All Sorts of Sorts There is also active discussion going in the comments. Since I had mentioned some aspects of my submission, I decided to post my write-up here (I've checked up with Cont...
Subscribe to Intel Developer Zone Blogs
Poor threading performance on Intel Xeon E5-2680 v2
By Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
By Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Locking CPU cache lines for a thread ( L1)
By Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
By Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
By Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
By Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
By Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
Subscribe to Forums