Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Digital Security and Surveillance on 4th generation Intel® Core™ processors Using Intel® System Studio 2015
By Naveen Gv (Intel)Posted 10/08/20140
This article presents the advantages of developing embedded digital video surveillance systems to run on 4th generation Intel® Core™ processor with Intel® HD Graphics, in combination with the Intel® System Studio 2015 software development suite. While Intel® HD Graphics is useful for developing...
Developing Embedded application on Intel® Atom™ Processor E3800 (Formerly Bay Trail) using ‘Intel® System Studio 2015’
By Naveen Gv (Intel)Posted 10/08/20140
This article is about developing any type of embedded application on Intel Atom 3800 processor using Intel System Studio
How to use Intel® Inspector for Systems
By kevin-oleary (Intel)Posted 10/06/20140
Background Intel® System Studio is the new embedded software tool suite and includes Intel® Inspector for Systems. This article will explain the steps you need to follow to run Inspector for Systems on an embedded platform. Overview We will use Yocto Project* version 1.2 as an example. This pl...
Intel® System Studio - Multicore Programming with Intel® Cilk™ Plus
By Hans Pabst (Intel)Posted 10/06/20140
Intel System Studio not only provides a variety of signal processing primitives via Intel® Integrated Performance Primitives (Intel® IPP), and Intel® Math Kernel Library (Intel® MKL), but also allows developing high-performance low-latency custom code (Intel C++ Compiler with Intel Cilk Plus). Si...
Subscribe to Intel Developer Zone Articles
Bubble, Bubble, Toil and Trouble; Mutex Lock and Buffer Double
By Clay Breshears (Intel) Posted on 12/31/13 4
Macbeth may have 99 problems, but parallel programming ain’t one of them.
Intel® Xeon Phi™ coprocessor Power Management Configuration: Why should I worry about configuring anything?
By Taylor Kidd (Intel) Posted on 12/30/13 0
Previous blogs on power management and a host of other power management resources can be found in List of Useful Power and Power Management Articles, Blogs and References. WHAT AND WHY DO WE WANT TO CONFIGURE IT There are several reasons why you might want to configure your power management in ...
Doctor Fortran in "It's a Modern Fortran World"
By Steve Lionel (Intel) Posted on 12/30/13 0
I recently received a copy of "Numerical Computing with Modern Fortran", by Richard Hanson and Tim Hopkins, and noted how many books on Fortran are being published recently with "Modern Fortran" in the titles. It turns out this is not a new phenomenon - a search on Amazon.com shows that this phra...
Quick Start Guides Published for the Intel® Xeon Phi™ Coprocessor Expert User
By Taylor Kidd (Intel) Posted on 12/20/13 3
This is a short notice to let you know that two new articles have been published for the Intel® Xeon Phi™ coprocessor: * Quick Start Guide: For the Intel Xeon Phi Coprocessor Administrator * Quick Start Guide: For the Intel Xeon Phi Coprocessor Developer The target of both of these guides is the expert user. Our assumption is that the expert user does not need to be told what to do, as he already has potentially decades of experience doing his job. Similarly, he does not need to be told how to research his area of expertise as he has done so dozens of times in the past. As these users are new to administering or developing on the Intel Xeon Phi coprocessor, they want to know only where they can find key resources, such as cluster administration guides, technical support and examples.
Subscribe to Intel Developer Zone Blogs
Using thread_local on C++ throws error
By Rihab A.5
I have been trying to convert a C++ MPI code into OpenMP. There are large number of static member variables (mostly dynamic lists of class objects), and i am trying to use 'thread_local' to make sure there are no conflicts. But the file does not compile and threw error: "error: expected a ";"". I was using ICC 14.  When i tried to use ICC 15 beta version, the particular file where i used thread_local compiled, but the compilation of the whole application failed at some other point: "undefined reference to '__cxa_thread_atexit'". Would greatly appreciate help in solving this issue.  
Poor threading performance on Intel Xeon E5-2680 v2
By Pascal10
Hello I am running a visualization program (visualizing a large dataset) where I can either use MPI or pthreads. When I run it on my desktop which has an Intel i7-2600K (4 cores, 8 threads), I get better performance using pThreads (I'm using a lot of threads, e.g 32) compared to using MPI which is normal (I guess). But when I run the same code on one node (which is part of a cluster) which has Intels Xeon E5-2680 v2 (10 cores, 20 threads), the performance I get using pthreads is worse than MPI; about 70s while using MPI compared to 180s using pthreads. Even worse, the performance on the Intel Xeon E5-2680 v2 is lower than on that of the Intel i7-2600K, it's around 100s on the 2600k but 180 on the  E5-2680 (same number of threads on both). I check using the top command and all the cores are active when I run the program.   So my question is why is that happening? Is there some other way I should be compiling the code on the E5-2680? Is there some variables I should set like KMP_AFFIN...
HTM/STM and Scheduling
By Simone A.1
Hi, I have a question about Hardware and Software Transactional Memory. Given the types of versioning (eager and lazy) and conflict detection (optimistic and pessimistic) and let's say that 2 or more threads are performing a transaction that write/read the same memory location. The scheduling of the threads could affect the ability of detect a conflict? Which combination of versioning and conflict detection would be better to always catch the conflicts? Hope my question is clear. Thanks. Best Regards, Simone
Locking CPU cache lines for a thread ( L1)
By Younis A.14
Hi I'm working on securing access to L1 cache by locking it line by line. Is there any way to do it? For example, two threads accessing the L1 and L1 lines are locked for a certain time to each thread accessed them. Regards, Younis
Responsive OpenMP Theads in Hybrid Parallel Environment
By Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
By Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
By Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
By Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
Subscribe to Forums

Highlights