Intel® Developer Zone:


Find out how Intel processor technology can improve your software.

Intel® Microservers
Intel® Xeon
Intel® Xeon Phi™ Coprocessor
Intel® Cache Acceleration Software


Technical Articles

Have something to share? Write a blog or article of your own.
Contribute today!


Gain skills to work with our tools and guides.



Get answers and solutions for your development questions.

Many Integrated Core Forum

Performance Forum


Deliver top application performance while minimizing development, tuning and testing time and effort.

A Concise Guide to Parallel Programming Tools for Intel® Xeon® Processors
Pick the right programming models and tools to boost your application’s performance.

Download Intel® OpenCL SDK
The first open, royalty-free standard for general-purpose parallel programming.

Parallel Studio XE 2013 is here
Powerful tools to make the most of clusters and supercomputers.

Intel® Compiler Options for Intel® SSE and Intel® AVX
Learn about the three main types of processor-specific optimizations.

Intel offers you a wide variety of capabilities to optimize your applications for performance, power, security and availability. Click on each of the buttons below to see what resources are available to you!

Intel offers a variety of Servers, Microservers, and Coprocessors to handle a variety of Cloud, Technical Computing, and Enterprise needs. Here are some resources to Compare the features from a hardware perspective.

On this page you will find information on the latest products launched by Intel, presented in a more software-centric perspective: the architecture and features, key software enabling insights, and how products are being used or configured for best performance.

Other useful resources:

Solution Optimization at Hyperscale with Intel and Red Hat
By DANIEL F. (Intel)Posted 03/25/20140
Red Hat and Intel collaborate energetically to ensure that binary code compatibility and optimization to deliver greater agility and lower TCO to customers. That includes optimizing platforms for virtualization and secure cloud computing.  For example, we’ve enabled Intel® Virtualization Technolo...
Intel® Trace Analyzer and Collector 9.0 Beta Update 1 Readme
By Gergana Slavova (Intel)Posted 03/24/20140
The Intel® Trace Analyzer and Collector is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to enable maximum performance of cluster applications. This Beta package is for users who develop on and build for Intel® 64 arc...
Intel® MPI Library 5.0 Beta Update 1 Readme
By Gergana Slavova (Intel)Posted 03/24/20140
The Intel® MPI Library is a high-performance interconnect-independent multi-fabric library implementation of the industry-standard Message Passing Interface, v3.0 (MPI-3.0) specification. This Beta package is for MPI users who develop on and build for Intel® 64 architectures on Linux* and Windows...
Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters
By Alexander Kalinkin (Intel)Posted 03/22/20140
The Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters (CPARDISO) is a powerful tool set for solving system of linear equations with sparse matrix of millions rows/columns size. CPARDISO provides an advanced implementation of the modern algorithms and could be considerate as ...


Subscribe to
Intel® Xeon Phi™ coprocessor Power Management Turbo Part 3: How can I design my program to make use of turbo?
By Taylor Kidd (Intel)Posted 02/20/20141
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at See [L...
Intel® Xeon® Processor E7 V2 Family New Reliability Features
By Khang Nguyen (Intel)Posted 02/18/20140
  NOTE:   Over the first half of 2014 we will be adding more detail on how applications can be augmented to be more "Recovery Aware".   Please subscribe to this page (button at the bottom) to receive notification when we have this updated information posted. 1) Introduction   In today’s world...
Detecting CPU-bound Applications in Server Systems
By loc-nguyen (Intel)Posted 02/14/20140
Applications in data centers process huge workloads every day. Many of them are CPU intensive, disk I/O intensive, network I/O intensive or a combination thereof. Maintaining a data center is challenging because the amount of work being run, and data being processed is getting larger, which may r... - Run your own server with Mesh Server Installer
By ylian-saint-hilaire (Intel)Posted 02/13/20140
It’s been a long time coming but today is finally the day we released the first simple Mesh Server Installer allowing anyone to launch their own version of The new installer is available on the Meshcentral information page along with documentation. It’s pretty amazing software ...


Subscribe to Intel Developer Zone Blogs
Will AVX-512 replace the need for dedicated GPU's?
By Christopher H.13
I do not expect it to replace high end graphics cards, and will likely be less efficient powerwise than a dedicated gpu (integrated or discrete). As far as I can tell performance wise it will easily make a CPU on par with a mid range GPU, which is far and above what the majority of people need. A 3Ghz 4 Core Skylake cores will have 768GFlops(3Ghz * 4Core * 2x16FMA). The GPU takes up a enough die space to allow for 8 core chips, which would double the max flops. Intel already has the OpenGL and DirectX software renderers from Larrabee. The only thing really lacking is memory bandwidth, although DDR4 and Crystalwell should help with this.
unaligned loads avx-128 vs. -256
By Tim Prince8
I just saw that my cases using _mm256_loadu_ps show better performance than _mm_loadu_ps on corei7-4, where the latter was faster on earlier AVX platforms (in part due to the ability of ICL/icc to compile the SSE intrinsic to AVX-128). Does this mean that advice to consider AVX-128 will soon be of only historical value?  I'm ready to designate my Westmere and corei7 linux boxes as historic vehicles. icc/ICL 14.0.1 apparently corrected the behavior (beginning with introduction of CEAN) where run-time versioning based on vector alignment never took the (AVX-256) vector branch in certain cases where CEAN notation produced effective AVX-128 code.  It seems now that C code can match performance of CEAN, if equivalent pragmas are applied. A key to getting an advantage for AVX-256 on corei7-4 appears to be to try reduced unroll.  In my observation, ICL/icc don't apply automatic unrolling to loops with intrinsics, while gcc does.  When not using intrinsics with ICL, I found the option 'ICL ...
TSX PEBS profiling is not accurate enough
By le g.0
Hi everyone,     I am recently working on a project that takes advantage of the intel TSX extension (RTM). Overall,  the program works well except that there are occasions that there are some unexpected abortions due to memory conflict. The result is obtained both with the returned EAX register and PEBS profiling.    At first, we suspect that the conflicts are induced from our programming mistakes. We have studied the Intel manual again and again, but could not find any clue. However, when we only run a single thread, conflict still happens.  We now have to suspect PEBS for TSX is somehow not accurate enough.    Do anyone know the details? If possible, how can I report this issue to the Intel engineer.    The target CPU in our project is Intel core i7 4770S.    Thanks all!
Tsx conflicts with rdrand
By le g.0
Hi all,    I found that rdrand instruction always causes RTM to abort, which is not documented in the manual. Any one has ever experienced the same situation? Thanks all.
Latest ASM compiler other than Intel C and C++ Compilers
By Uday Krishna G.6
Hi, Am trying to code my application in Assembly to run on x86. Please suggest me the suitable compiler which will support all SSE4.2 Assembly instructions(other than Intel Compiler). If any links which help in execution and procedure will be helpful. 
An M/M/n queuing model simulation
By aminer100
  Hello, An M/M/n queuing model simulation with Object Pascal and my Thread Pool Engine - version 1.02 You can download it from: Read more bellow... Author: Amine Moulay Ramdane Description: It's harder and sometimes impossible to get analytical results about waiting times and queue length for general interarrival and service distributions; so, it's important to be able to estimate these quantities by observing the results of simulation. It's very easy in Object Pascal to simulate a sequence of arrival times with a given interarrival distribution. Look at the examples MM1.pas( M/M/1 queuing model) and MMn.pas(M/M/n - n: number of servers -) inside the zip file: --------------------------- InterArrivals:=TExponentialDistribution.Create(420623,1.0/3.0); ServiceTimes:=TExponentialDistribution.Create(220623,1.0/4.0); currtime:=0.0; for i:=1 to simnumber do begin obj:=TJob.create; obj.simnumber:=simnumber; obj.number:=i; currtime:=currtime+InterArrivals...
Pointers defined in modules and OpenMP
By Jerome B.2
I am working with a program (which I did not write) which has a pointer to a derived type in a module; module X type mytype     integer x, y, z end type mytype type (mytype), pointer :: p_mt end module X   This module is accessed in a subroutine; subroutine Loop use X p_mt  => GoGetOne() p_mt % x = 7.0 ... So far, so good. However, subroutine Loop is accessed from with a parallel loop in another subroutine;   subroutine CallLoop() integer i !$OMP parallel do do i = 1 to 10000     call Loop(i) enddo It is my understanding that p_mt is global in scope, and therefore should not be accessed from within a parallel loop. If I declare Loop as pure; pure subroutine Loop() the compiler flags the assignment of a value to p_mt as an error. Am I missing something? Or is this a potential bug?    
2 CPUs vs num_threads
By Leos P.6
I have 2 xeon CPUs in the PC, each has 4 cores. However, I can only set num_threads to 4. If I set it to a number > 4, I get a message: OMP: Error #136: Cannot create thread. OMP: System error #8: Not enough storage is available to process this command. OMP: Error #178: Function GetExitCodeThread() failed: OMP: System error #6: The handle is invalid. Is it not possible to use all the cores in the system because they are distributed across 2 cpus or why is this happening? (Compiler: Intel C++ 13.0 OS: Windows server 2008 R2)


Subscribe to Forums

Get More Information

Software Development Products