Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Intel® Xeon Phi™ Coprocessor code named “Knights Landing” - Application Readiness
By Indraneil Gokhale (Intel)Posted 09/15/20140
As part of the application readiness efforts for future Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors (code named Knights Landing), developers are interested in improving two key aspects of their workloads: Vectorization/code generation Thread parallelism This article mainly talks a...
Easy SIMD through Wrappers
By adminPosted 03/27/20150
SIMD operations are widely used for 3D graphics applications. This tutorial provides new insights into SIMD by comparing SIMD lanes and CPU threads, and steps you through the process of creating a simple, straightforward SIMD implementation in your own code.
Abaqus/Standard Performance Case Study on Intel® Xeon® E5-2600 v3 Product Family
By Khang Nguyen (Intel)Posted 03/27/20150
Background The whole point of simulation is to model the behavior of a design and potential changes against various conditions to determine whether we are getting an expected response; and simulation in software is far cheaper than building hardware and performing a physical simulation and modif...
Avoid frequency drop in GPU cores when executing applications in Heterogeneous mode
By Anoop Madhusoodhanan Prabha (Intel)Posted 03/23/20150
Introduction Intel(R) C++ Compiler 15.0 provides a feature which enables offloading general purpose compute kernels to processor graphics. This feature enables the processor graphics silicon area for general purpose computing. The key idea is to utilize the compute power of both CPU cores and GP...
Subscribe to Intel Developer Zone Articles
Advanced Computer Concepts For The (Not So) Common Chef: Terminology Pt 1
By Taylor Kidd (Intel) Posted on 03/24/15 0
Before we start, I will use the next two blogs to clear up some terminology. If you are familiar with these concepts, I give you permission to jump to the next section.  I suggest any software readers still check out the other blog about threads. There is a lot of confusion, even among us softwar...
Check out the Parallel Universe e-publication
By Mike Pearce (Intel) Posted on 03/18/15 0
The Parallel Universe is a quarterly publication devoted to exploring inroads and innovations in the field of software development, from high performance computing to threading hybrid applications. Issue #20 - Cover story: From Knights Corner to Knights Landing: Prepare for the Next Generation o...
VTune™ Amplifier XE 2015 Update 2 supports for driverless hardware event-based sampling with call stack info
By Peter Wang (Intel) Posted on 03/15/15 1
In general, vtune drivers will be built and loaded to the Linux* system automatically during installing VTune™ Amplifier XE product, then hardware PMU event-based sampling can work.  However sometime, vtune drivers were built/loadeded unsuccessfully, because of one of below reason: 1.    There ...
Intel® Xeon Phi™ Coprocessor Developer Training Coming to a City Near You in 2015
By Mike Pearce (Intel) Posted on 03/04/15 0
Intel is offering an updated and expanded series of software developer trainings in parallel programming using the Intel® Xeon Phi™ coprocessor.
Subscribe to Intel Developer Zone Blogs
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1
By kathy-farrel (Intel)0
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1 - What's New - Webinar Tuesday, September 17 9am PDT Please join us for a technical presentation on the new features found in the recently released Intel® Parallel Studio XE 2013 SP1 Intel® Cluster Studio XE SP1. This release includes support for compilers and performance analysis on Intel® Xeon Phi™ on Windows*. The technical presentation will briefly cover new features for both C++ and Fortran on Linux*, Windows*, and OS X* operating systems as well as error checking and performance profiling tools. Learn how to efficiently boost your application performance! Not too late! - Register Now  Learn about Upcoming Webinars
Parallel Image Processing in OpenMP - Image Blocks
By Royi0
Hello, I'm doing my first steps in the OpenMP world. I have an image I want to apply a filter on. Since the image is large I wanted to break it into non overlapping parts and apply the filter on each independently in parallel. Namely, I'm creating 4 images I want to have different threads. I'm using Intel IPP for the handling of the images and the function to apply on each sub image. I described the code here: http://stackoverflow.com/questions/29319226/parallel-image-processing-in... The problem is I tried both sections and parallel for and got only 20% improvement. What am I doing wrong? How can I tell each "Worker" that though data is taken from the same array, it is safe to read (Data won't change) and write (Each worker has exclusive approach to its part of the result image). Thank You.
COPROCESSADOR PHI AND JAVA
By Rafael R.2
Hi, In our university bought a machine with CO-PROCESSOR PHI. The description in the site: https://software.intel.com/en-us/articles/intelr-xeon-phitm-coprocessor-... It is reported that there is no support JAVA yet. The answer is 2013 and we are already in 2015. Is there a Java option for coding? Tks Rafael
Intel® Xeon Phi™ Coprocessor Developer Training Coming to a City Near You in 2015
By Mike Pearce (Intel)0
https://software.intel.com/en-us/blogs/2015/03/04/intel-xeon-phi-coprocessor-developer-training-coming-to-a-city-near-you-in-2015
Mixing kernel space and userspace in a new kernel.
By Jog L.0
Hello, I was thinking of creating an open source kernel (with block already written in the linux kernel obviously). Now I would like to hear from experts what are the dangers to run in ring0 if no users and no external connections are done. We are in a situation in which the processor is isolated from the whole world. No one can mess with it. all the processes running on top of it have to register and are created and compiled by root using a specific memory range. No process can be launched without the acceptation of root. No human accesses it. The code running inside is reviewed and we have facilities to be sure no other memory range than the one we expect each process to use can be used. That is for the -restrictive- context. Now, could we imagine it be possible for such a kernel to exist or are there some limitations that I don't predict ? The kernel is to be massively specialized, hence the "almost starting from scratch". Thanks for your insights, Jog
linking with two versions of mkl (multi threaded and single threaded) in one application
By Michal K.3
Hi, Is it possible to use both the single threaded version of mkl library and the multi threaded version of mkl in one application? I need the single threaded version to use with PLASMA library, yet at some other part of my code, I need use mkl PARDISO, for which I need the multi threaded version. Any help will be greatly appreciated. Cheers Michal  
PCIe 3.0 reference clock jitter tool
By Sonal C.0
Where can I access the Intel PCIe clock jitter tool
Memory to CPU (mov) bandwidth limitations
By albus d.3
(sorry for weak english I am not native english, Not sure if right forum, first time here - This is general about some hardware limits i do not understand technical reason and I would very like to know) We have now parallelised SIMD arithmetic (like 8 float mulls or divisions in one step) theoretical (but also nearly practical) arithmetical bandwidth per core is thus like 4GHz * 8 floats = about 30 GFLOPS per core or something like that But we still AFAIK have quite low RAM to CPU bandwidth at the level of read or write of 1 or 2 int of float per nanosecond, such ram-2-cpu bandwidth when i am testing it is like only 2 GLOP per second per core or something like that; (both those values are rough but this difference seem to be physical truth at least from my experience) I mean arithmetic can be paralelised (like 8-vectorised) but load/store movs are not - thus SIMD paralistation has obly a fraction of its potential power This is extremally crusial to increase this memory bandwith (muc...
Subscribe to Forums

Highlights