Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

How Intel® AVX2 Improves Performance on Server Applications
By Thai Le (Intel)Posted 09/05/20140
The latest Intel® Xeon® processor E5 v3 family includes a feature called Intel® Advanced Vector Extensions 2 (Intel® AVX2), which can potentially improve application performance related to high performance computing, databases, and video processing. Here we will explain the context, and provide ...
What’s New in the Intel Compiler
By AmandaS (Intel)Posted 08/25/20140
The list below summarizes new features in the Intel® C++ Compiler 15.0 and the Intel® Fortran Compiler 15.0. For more details about changes in the Intel compilers since the previous release, including a list of new options, please refer to the ‘What’s New’ section in the release notes (C++, Fortr...
OpenMP* 4.0 combined offload constructs support for the Intel® Xeon Phi™ coprocessor
By Kevin Davis (Intel)Posted 08/22/20140
The Intel® Parallel Studio XE 2015 Composer Editions for Windows* and Linux* have feature enhancements that provide near full support of the OpenMP* 4.0 API (July 2013) specification. Extensions to the reduction clause and the new declare reduction construct added to support user defined reductio...
Transitioning from Valgrind* Tools to Intel® Inspector XE
By Holly Wilper (Intel)Posted 08/19/20140
The open source Valgrind* framework supports several tools for checking the memory and threading correctness of your code. Intel® Inspector XE has that same functionality but supports additional operating systems (Linux* and Microsoft Windows*), languages (C, C++, Microsoft .NET*, Fortran), and t...
Subscribe to Intel Developer Zone Articles
Go Parallel
By Dmitry Vyukov Posted on 06/18/13 20
This is a first post in a series of posts about parallel programming with Go language. What is Go? You may ask. Go is a language with the cutest mascot ever: As you may see, it also supports parallel programming: as well as concurrent programming: I am sure you are already excited by the langu...
Monitoring Intel® Transactional Synchronization Extensions with Intel® PCM
By Roman Dementiev (Intel) Posted on 06/14/13 2
After applying a new technology (a new processor, a hardware accelerator, a new instruction, etc) besides measuring the immediate performance delta one requires a method to verify that this technology has been applied correctly and efficiently. Intel® Transactional Synchronization Extensions (Int...
Measuring Memory Bandwidth on the Intel® Xeon Phi™ Coprocessor
By Sumedh Naik (Intel) Posted on 05/28/13 2
The memory bandwidth of an application is an important metric to have at your fingertips when optimizing your application. One can measure the memory bandwidth of an application running on the Intel Xeon Phi coprocessor by one of the two ways: by using the core hardware events or by using the unc...
Using HLE and RTM with older compilers with tsx-tools
By Andreas Kleen (Intel) Posted on 05/20/13 3
To use HLE/RTM to improve lock scalability the lock library needs to be enabled. If you already have an enabled lock library, like glibc on Linux, you can just use normal locking with that library. If the lock library doesn't support it or you have your own lock the library needs to be enabled, l...
Subscribe to Intel Developer Zone Blogs
Help! Unity and Parallel Studio
By Don Fantom J.1
  Hello, I'm a fresh. I 'm working on a project, in which I use the Unity to develop a game. We mainly use the C# script. I want to know if I can use the parallel studio 2013 to detect the effort, hotsopt and usage of my project? And how to detect? If it can't do that, is there any authority alternative ? Your help would be greatly appreciated!!! Thanks Very Much.
Haswell TSX using RTM (beginner student)
By tshan k.3
Hello, I am just getting introduced into haswell's TSX infrastructure using RTM. I have downloaded the rtm.h header files from online and i tried producing a simple counter. Unfortunately every time i compile and run the program, the _xbegin function does not execute the transaction inside.  I would be greatly appreciated for your help. thanks #include <stdio.h> #include <stdlib.h> #include "rtm.h" void main(){     int N=5;     int i;     int status;     int counter = 0;     status = _xbegin(); if (status == _XBEGIN_STARTED) {     for (i=0; i<N ; i++)  {         counter++;         printf("counter value: %d\n", counter);     }     _xend(); }      else          printf("did not work\n"); }
Using thread_local on C++ throws error
By Rihab A.5
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14.  When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.  
Poor threading performance on Intel Xeon E5-2680 v2
By Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
By Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Locking CPU cache lines for a thread ( L1)
By Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
By Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
By Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
Subscribe to Forums

Highlights