Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Diagnostic 15304: non-vectorizable loop instance
By Devorah H. (Intel)Posted 10/02/20140
Product Version: Intel(R) Visual Fortran Compiler XE 15.0.0.070   Cause: The vectorization report generated using Visual Fortran Compiler's optimization and vectorization report options (-Qvec-report2 -O2) includes non-vectorized loop instance when using the following compiler option (Win OS): ...
OpenCL™ Device Fission for CPU Performance
By TERENCE S. (Intel)Posted 09/30/20140
Download PDF Summary Device fission is a feature of the OpenCL™ specification that gives OpenCL programmers more power and control over managing which computational units execute OpenCL commands. Fundamentally, device fission allows you to sub-divide a device into one or more sub-devices, which...
GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions
By Robert Ioffe (Intel)Posted 09/29/20140
Introduction A Brief History of Quicksort A Brief Introduction to GPU-Quicksort GPU-Quicksort in OpenCL 1.2 Converting GPU-Quicksort to OpenCL 2.0 Tutorial Requirements Running the Tutorial Conclusion References About the Author Download Code Introduction This tutorial shows how to use...
OPTIMIZING STORAGE SOLUTIONS USING THE INTEL® INTELLIGENT STORAGE ACCELERATION LIBRARY
By Thai Le (Intel)Posted 09/24/20147
With the growing number of devices connected to the Cloud/Internet, data is being generated from many different sources including smartphones, tablets, and Internet of Things devices. The demand for storage is growing every year.  The combination of the Intel® Xeon® processor family and the Intel...
Subscribe to Intel Developer Zone Articles
Fun with Intel® Transactional Synchronization Extensions
By Wooyoung Kim (Intel) Posted on 07/25/13 0
By now, many of you have heard of Intel® Transactional Synchronization Extensions (Intel® TSX). If you have not, I encourage you to check out this page (http://www.intel.com/software/tsx) before you read further. In a nutshell, Intel TSX provides transactional memory support in hardware, making t...
AVX-512 instructions
By James Reinders (Intel) Posted on 07/23/13 15
Intel® Advanced Vector Extensions 512 (Intel® AVX-512) The latest Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. These instructions represent a significant leap to 512-bit SIMD s...
Figures/Tables for presentations from Xeon Phi Book
By James Reinders (Intel) Posted on 07/18/13 0
The figures, tables, drawings, etc. used in our book can be downloaded from the book's website. We appreciate attribution, but there are no restrictions on use in educational material (presentations)! Suggestion attribution: (c) 2013 Jim Jeffers and James Reinders, used with permission.        
Go Parallel
By Dmitry Vyukov Posted on 06/18/13 20
This is a first post in a series of posts about parallel programming with Go language. What is Go? You may ask. Go is a language with the cutest mascot ever: As you may see, it also supports parallel programming: as well as concurrent programming: I am sure you are already excited by the langu...
Subscribe to Intel Developer Zone Blogs
2nd Part of the squad combination
By Mak D.0
[url=http://www.reddit.com/r/top10t2/comments/2axy97/]British open 2014 live stream open Championship Golf Watch online[/url]
Lunching several MPI processes on multicore nodes
By Dmitry K.3
Hi everyone, I have a simple issue, which must have a solution. Is it possible to assign several MPI processes to several nodes, such that first MPI process occupies full node, whereas other MPI processes are distributed on cores of the other nodes? I have an example below: On a cluster with 4 cores per node, to assign 2 MPI process to 2 nodes I do the following: #PBS -l nodes=2:ppn=4 mpirun -pernode -np 2 ./hybprog The question is how to assign 8 MPI processes to 3 nodes, such that first MPI process occupies first node, whereas other 7 MPI processes are distributed on 7 cores of the other two nodes?  Best Regards, Dmitry        
Threadprivate issue
By Adrian J.1
I'm having problems with ifort version 14.0.1 I'm working on a hybrid (OpenMP+MPI) FORTRAN code.  In that code the following pointer is declared and specified as threadprivate.  However, when I include it in a OpenMP parallel region (default none), I get this compile error: ftn  -O3 -r8 -openmp cal_xy.F90 cal_xy.F90(750): error #6752: Since the OpenMP* DEFAULT(NONE) clause applies, the PRIVATE, SHARED, REDUCTION, FIRSTPRIVATE, or LASTPRIVATE attribute must be explicitly specified for every variable.   [TERM_X]              select type(term_x) If I add the variable to one of the data sharing clauses of the parallel region I get this error instead: ftn -O3 -r8 -openmp calc_xy.F90 calc_xy.F90(739): error #7859: A SHARABLE or THREADPRIVATE entity is not permitted in a PRIVATE, FIRSTPRIVATE, LASTPRIVATE, SHARED or REDUCTION clause.   [TERM_X]                 call term_x%add(mat_a,col_r,& It looks to be like the first error I get (the #6752 error) only occurs from the "select type...
[OpenMP - Fortran] Scope of COMMON block variables
By Edgardo Doerner2
Dear all, Although the answer of the question in the title is, in principle, quite clear, I am confused about the scope (shared or private) of variables declared in COMMON blocks inside functions that are not the main function. For example, I am trying to parallelize a MC Code of particle transport on matter using OpenMP, the main part of this program is like the following code: PROGRAM TUTOR2 IMPLICIT NONE [Variable initialization, some COMMON blocks, program initialization routines, etc...] C$OMP PARALLEL NUM_THREADS (4) C$OMP SINGLE       WRITE(*,*) "Number of OpenMP threads: ", OMP_GET_NUM_THREADS()       DO I=1,NCASE C$OMP TASK FIRSTPRIVATE(ESCORE)         CALL SHOWER(IQIN,EIN,XIN,YIN,ZIN,UIN,VIN,WIN,IRIN,WTIN) C$OMP END TASK       END DO C$OMP END SINGLE NOWAIT C$OMP END PARALLEL So I have a function call SHOWER() that realizes the simulation of the particle cascade. This function call another functions (RANDOM, PHOTON, ELECTR, etc) that realize the transport calcu...
C code - assertion failed: find_seq_in_lookup_table bug
By Mohammed I.2
Hi, I'm having a problem when trying to compile with icc on c code, I'm getting: Internal error loop: assertion failed: find_seq_in_lookup_table: seq_number not found (shared/cfe/edgcpfe/il.c, line 3866) I had searched for this bug and seeming it had fixed at update5  (https://software.intel.com/en-us/articles/intel-composer-xe-2013-compile...) The problem is that I'm writing in C++ not in C !!! Useful Info: I'm using update.1 old licensed version, and got the above error. When I tried to download and install trial update 3 version, I had faced a problem regarding the license:   Error: A license for CCompL is not available (-5,412). Make sure that a license file is being used that contains a license for the requested feature.  If your license requires a license server, make sure that the server is using the right license file (usually, this would be the same license file that is being used by this application), and make sure that you have not changed the license file since star...
Help! Unity and Parallel Studio
By Don Fantom J.1
  Hello, I'm a fresh. I 'm working on a project, in which I use the Unity to develop a game. We mainly use the C# script. I want to know if I can use the parallel studio 2013 to detect the effort, hotsopt and usage of my project? And how to detect? If it can't do that, is there any authority alternative ? Your help would be greatly appreciated!!! Thanks Very Much.
Haswell TSX using RTM (beginner student)
By tshan k.3
Hello, I am just getting introduced into haswell's TSX infrastructure using RTM. I have downloaded the rtm.h header files from online and i tried producing a simple counter. Unfortunately every time i compile and run the program, the _xbegin function does not execute the transaction inside.  I would be greatly appreciated for your help. thanks #include <stdio.h> #include <stdlib.h> #include "rtm.h" void main(){     int N=5;     int i;     int status;     int counter = 0;     status = _xbegin(); if (status == _XBEGIN_STARTED) {     for (i=0; i<N ; i++)  {         counter++;         printf("counter value: %d\n", counter);     }     _xend(); }      else          printf("did not work\n"); }
Using thread_local on C++ throws error
By Rihab A.5
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14.  When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.  
Subscribe to Forums

Highlights