Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

DrDebug : Linux Command Line Usage
By Harish Patil (Intel)Posted 02/05/20150
Using DrDebug requires following two phases1. recording and 2. replaying. Pre-requisites Setup  Recording With GDB From command line (without GDB)  Replaying With GDB From command line (without GDB)  Pre-requisites GDB version 7.4 or higher with Python support PinPlay/DrDebu...
Quick Installation Guide for OpenCL™ Development on Windows* with Intel® INDE
By Robert Ioffe (Intel)Posted 02/04/20150
Intel® INDE provides a comprehensive tool set for developing applications targeting both CPU and GPUs, enriching the development experience of an OpenCL developer. Yet, if you got used to work with the legacy Intel® SDK for OpenCL™ Applications or if you just want to get started and build your fi...
Analyzing Intel® SDE's TSX-related log data for capacity aborts
By HASSAN SALEHE MATAR (Intel)Posted 01/19/20150
Starting with version 7.12.0, Intel® SDE has Intel® TSX-related instruction and memory access logging features which can be useful for debugging Intel® TSX's capacity aborts. With the log data from the Intel SDE you can diagnose cache set population to determine if there is non-uniform cache set ...
OpenCV 3.0.0-beta ( IPP & TBB enabled ) on Yocto with Intel® Edison
By JON J K. (Intel)Posted 12/22/20146
< Overview >  This article is a tutorial for setting up OpenCV 3.0.0-beta on Yocto with Intel® Edison. We will build OpenCV 3.0.0-beta on Edison Breakout/Expansion Board using a Linux host machine and it takes up a lot of space on Edison, therefore, it is required to have at least 2GB micr...
Subscribe to Intel Developer Zone Articles
Intel® Xeon Phi™ coprocessor Power Management Turbo Part 3: How can I design my program to make use of turbo?
By Taylor Kidd (Intel) Posted on 02/20/14 1
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. See [L...
Why has CPU frequency ceased to grow?
By victoria-zhislina (Intel) Posted on 02/19/14 0
All of you probably recall the rapid rate of CPU frequency advancement at the end of the last century and beginning of this one.  Tens of megahertz rapidly transformed into hundreds, and then hundreds of megahertz quickly became a full gigahertz, then a gigahertz and a bit, finally two gigs and ...
Intel® Xeon Phi™ coprocessor Power Management Configuration: Using the micsmc command-line Interface
By Taylor Kidd (Intel) Posted on 01/31/14 0
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. INTRO...
Introduction to Embree 2.1 - Part 1
By louis-feng (Intel) Posted on 01/24/14 0
This is part of a series of blogs on Embree, a collection of high performance ray tracing kernels. Embree has been released open source since version 1.0. Version 2.0 was released during SIGGRAPH 2013 and Embree 2.1 was published on github just before Christmas 2013. The official web site has an ...
Subscribe to Intel Developer Zone Blogs
Threadprivate issue
By Adrian J.1
I'm having problems with ifort version 14.0.1 I'm working on a hybrid (OpenMP+MPI) FORTRAN code.  In that code the following pointer is declared and specified as threadprivate.  However, when I include it in a OpenMP parallel region (default none), I get this compile error: ftn  -O3 -r8 -openmp cal_xy.F90 cal_xy.F90(750): error #6752: Since the OpenMP* DEFAULT(NONE) clause applies, the PRIVATE, SHARED, REDUCTION, FIRSTPRIVATE, or LASTPRIVATE attribute must be explicitly specified for every variable.   [TERM_X]              select type(term_x) If I add the variable to one of the data sharing clauses of the parallel region I get this error instead: ftn -O3 -r8 -openmp calc_xy.F90 calc_xy.F90(739): error #7859: A SHARABLE or THREADPRIVATE entity is not permitted in a PRIVATE, FIRSTPRIVATE, LASTPRIVATE, SHARED or REDUCTION clause.   [TERM_X]                 call term_x%add(mat_a,col_r,& It looks to be like the first error I get (the #6752 error) only occurs from the "select type...
[OpenMP - Fortran] Scope of COMMON block variables
By Edgardo Doerner2
Dear all, Although the answer of the question in the title is, in principle, quite clear, I am confused about the scope (shared or private) of variables declared in COMMON blocks inside functions that are not the main function. For example, I am trying to parallelize a MC Code of particle transport on matter using OpenMP, the main part of this program is like the following code: PROGRAM TUTOR2 IMPLICIT NONE [Variable initialization, some COMMON blocks, program initialization routines, etc...] C$OMP PARALLEL NUM_THREADS (4) C$OMP SINGLE       WRITE(*,*) "Number of OpenMP threads: ", OMP_GET_NUM_THREADS()       DO I=1,NCASE C$OMP TASK FIRSTPRIVATE(ESCORE)         CALL SHOWER(IQIN,EIN,XIN,YIN,ZIN,UIN,VIN,WIN,IRIN,WTIN) C$OMP END TASK       END DO C$OMP END SINGLE NOWAIT C$OMP END PARALLEL So I have a function call SHOWER() that realizes the simulation of the particle cascade. This function call another functions (RANDOM, PHOTON, ELECTR, etc) that realize the transport calcu...
C code - assertion failed: find_seq_in_lookup_table bug
By Mohammed I.2
Hi, I'm having a problem when trying to compile with icc on c code, I'm getting: Internal error loop: assertion failed: find_seq_in_lookup_table: seq_number not found (shared/cfe/edgcpfe/il.c, line 3866) I had searched for this bug and seeming it had fixed at update5  (https://software.intel.com/en-us/articles/intel-composer-xe-2013-compile...) The problem is that I'm writing in C++ not in C !!! Useful Info: I'm using update.1 old licensed version, and got the above error. When I tried to download and install trial update 3 version, I had faced a problem regarding the license:   Error: A license for CCompL is not available (-5,412). Make sure that a license file is being used that contains a license for the requested feature.  If your license requires a license server, make sure that the server is using the right license file (usually, this would be the same license file that is being used by this application), and make sure that you have not changed the license file since star...
Help! Unity and Parallel Studio
By Don Fantom J.1
  Hello, I'm a fresh. I 'm working on a project, in which I use the Unity to develop a game. We mainly use the C# script. I want to know if I can use the parallel studio 2013 to detect the effort, hotsopt and usage of my project? And how to detect? If it can't do that, is there any authority alternative ? Your help would be greatly appreciated!!! Thanks Very Much.
Haswell TSX using RTM (beginner student)
By tshan k.3
Hello, I am just getting introduced into haswell's TSX infrastructure using RTM. I have downloaded the rtm.h header files from online and i tried producing a simple counter. Unfortunately every time i compile and run the program, the _xbegin function does not execute the transaction inside.  I would be greatly appreciated for your help. thanks #include <stdio.h> #include <stdlib.h> #include "rtm.h" void main(){     int N=5;     int i;     int status;     int counter = 0;     status = _xbegin(); if (status == _XBEGIN_STARTED) {     for (i=0; i<N ; i++)  {         counter++;         printf("counter value: %d\n", counter);     }     _xend(); }      else          printf("did not work\n"); }
Using thread_local on C++ throws error
By Rihab A.5
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14.  When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.  
Poor threading performance on Intel Xeon E5-2680 v2
By Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
By Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Subscribe to Forums

Highlights