Intel® Developer Zone:


Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources

Development Tools


Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Easy SIMD through Wrappers
By adminPosted 03/27/20150
SIMD operations are widely used for 3D graphics applications. This tutorial provides new insights into SIMD by comparing SIMD lanes and CPU threads, and steps you through the process of creating a simple, straightforward SIMD implementation in your own code.
Abaqus/Standard Performance Case Study on Intel® Xeon® E5-2600 v3 Product Family
By Khang Nguyen (Intel)Posted 03/27/20152
Background The whole point of simulation is to model the behavior of a design and potential changes against various conditions to determine whether we are getting an expected response; and simulation in software is far cheaper than building hardware and performing a physical simulation and modif...
Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode
By Anoop Madhusoodhanan Prabha (Intel)Posted 03/23/20150
Introduction Intel(R) C++ Compiler 15.0 provides a feature which enables offloading general purpose compute kernels to processor graphics. This feature enables the processor graphics silicon area for general purpose computing. The key idea is to utilize the compute power of both CPU cores and GP...
Intel Cluster Ready FAQ: Software vendors (ISVs)
By Werner Krotz-vogel (Intel)Posted 03/23/20150
Why should we join the Intel Cluster Ready program? A: By offering registered Intel Cluster Ready applications, you can provide the confidence that applications will run as they should, right away, on certified clusters. Participating in the program will help you increase application adoption, e...
Subscribe to Intel Developer Zone Articles
Web Resources about Intel® Transactional Synchronization Extensions
By Roman Dementiev (Intel) Posted on 07/28/14 3
Short URL for this page: In this blog I list useful technical resources related to Intel® Transactional Synchronization Extensions (Intel TSX). I will try to keep the list up-to-date as new material becomes available (subscribe to this page below to get update notifica...
Additional AVX-512 instructions
By James Reinders (Intel) Posted on 07/17/14 1
Additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) The Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. As I discussed in my first blog about Intel® AVX-...
Using Intel® TSX with VTune(TM) Amplifier XE 2015 Beta to measure transaction time & abort in your code?
By Peter Wang (Intel) Posted on 07/12/14 2
When the user develops multithreaded applications, the user should protect critical (sensitive) code area called by threads, so threads access shared memory without data conflict. Most of time, the user might use critical_section, mutex, semaphore, atomic, events, or other “locks” to protect crit...
Compete And Win A Prize With The New Intel® CnC!
By Frank Schlimbach (Intel) Posted on 07/10/14 0
A new version if Intel® Concurrent Collections for C++ (CnC) has been released. We are celebrating its coming out to open source with a programming contest, which will have its showdown at the 6th annual CnC workshop. The organizers call on individuals and small teams to compete for a significant...
Subscribe to Intel Developer Zone Blogs
Nested OMP on Xeon Phi using OMP4
By james B.3
Xeon Phi has 60 cores and 4 threads per core. I am writing an experiment that will have 1 master thread on each core, and each of these will spawn  4 slave threads. Looking at the manual it seems that I want to set the envars: MIC_OMP_NESTED=TRUE MIC_OMP_PROC_BIND="spread, close" MIC_OMP_NUM_THREADS=60Is this correct? I've tested this and it doesn't die... Is there a way I can get the runtime to spitout affinity debug info about where it is actually placing things so I can be certain? Cheers, James
Slowdown with OpenMP
By Matt S.11
I'm getting some pretty unusual results from using OpenMP on a fractional differential equations code written in fortran. No matter where I use OpenMP in the code, whether it be on an intilization loop or on a computational loop, I get a slowdown across the entire code. I can put OpenMP in one loop and it will slow down an unrelated one (timed seperately)! The code is a bit unusual, as it initalizes arrays starting at 0 (and some even negative). For example, real*8 :: gx(0:Nx) real*8 :: AxLh(1-Nx:Nx-1), AxRh(1-Nx:Nx-1), AxL0(1-Nx:Nx-1), AxR0(1-Nx:Nx-1) Where Nx is, let's say, 512. Would that possibly have anything to do with the ubiquitous slowdown with OpenMP? Also, any ideas on reducing "pow" overhead in the following snippet would be greatly appreciated do k = 1, 5 hgck = foo_c(k) hgpk = foo_p(k) do j = 1, 100 vx = vx + hgck * ux(x, t, foo(j) + hgpk) end do end do where ux is a function defined by function ux(x,t,xi) impl...
web crawling through "Intel Xeon Phi Coprocessors"
By Sunil K.1
I am new to this forum. I want to implement parallel crawling on "Intel Xeon Phi Coprocessors" as for my project. Before buying equipment, installing software and start learning about this platform I want to know that whether it is possible to somehow connect to Network and get web URLs in parallel using this technology? (I don't want to create cluster of CPUs to do. I want to do it using single card).
Intel MPI for Phi tuning tips?
By Ronald W Green (Intel)3
Does setting     I_MPI_MIC=enable change other MPI environment variables, particularly any that would tune MPI for the MIC system architecture?   As a side question, has anyone written a Tuning and Tweaking guide for IMPI for Phi?  For example, what I_MPI variables could one use to help tune an app targeting 480 ranks across 8 Phis? Thanks Ron
Lock-free Java, or better scaling on multi-core systems
By William L.0
Everyone these days has to address multi-core issues, or vertical scaling, at least on the server-side of things. And there does not seem to be a general approach, so we end up re-architecting our applications every time we add cores. At the same time, the availability of many-core processors seems to be constrained by the lack of a reasonable software technology to make good use of them. Actors seems like a good approach, and allow you to write fast, lock-free code. But large actor-based systems are not robust. Most actor implementations require applications to implement a state machine per actor for determining what messages are to be processed, and maintaining a large number of interacting state machines is well beyond the abilities of most developers. Which is very sad, as throughput of actor-based applications typically scales with the number of cores. I've worked on this problem for a number of years now and have developed a simple variation on actors which support non-blockin...
igzip for VS10 C++?
By David L.6
I was searching for a zlib-compatible compressor but faster, and came cross the paper describing igzip -- High Performance DEFLATE Compression on Intel Architecture Processors igzip looks like exactly (!) what I am looking for.  Compatible with zlib, but faster. However, the downloadable source was for Linux.  I need it for a VS10 C++ project.  I have successfully (I think) compiled and assembled the desired modules (common, crc, crc_utils, hufftables, hufftables_c.cpp, igzip0c_body, igzip0c_finish, init_stream) into a .lib.  But when I attempt to link the library into my project, I get error LNK2019: unresolved external symbol fast_lz (and init_stream) from where they are called.  I also have a "C" lz4 compression library linked into the project, and it works fine.  I have spent 3 days playing with it, looking for the clue that will unlock the symbols, but no luck so far. I get no other warnings and/or errors during the compiling/assembling of the library or project.  Any help (esp...
OpenCL vs Intel Cilk Plus Issues, Differences and Capabilities
By Yaknan G.0
I  am curious as to the differences between OpenCL and Intel Cilk Plus. They are both parallel programming paradigms that are receiving wide recognition but technically speaking is one better than the other or are they simply different. Also what yardstick do I use when choosing between the two when solving an embarrassingly parallel problem. Please i need answers. Thanks! Yaknan
Thread complexion(Multi-threading)
By Masood Ali M.4
Hello everyone,                            On the other day was trying to create a thread which could capture the working of an already existing(working) thread and copy its working. Setting priority of threads so that they can capture the working of the same priority level threads and also dynamic increase in the thread capacity to handle similar kind of work. would appreciate if anybody could help with it. Thanks. -Ali
Subscribe to Forums