Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Diagnostic 15523: Loop was not vectorized: cannot compute loop iteration count before executing the loop.
By Devorah H. (Intel)Posted 10/29/20140
Product Version: Intel(R) Visual Fortran Compiler XE 15.0.0.070 Cause: The vectorization report generated when using Visual Fortran Compiler's optimization options ( -O3  -Qopt-report:2 ) states that loop was not vectorized since loop iteration count cannot be computed. Example: An example be...
Overhead and Spin Time Issue in Intel® Threading Building Blocks Applications Due to Inlining
By Jackson Marusarz (Intel)Posted 10/28/20140
Intel® Threading Building Blocks (Intel TBB) applications may have an incorrectly high amount of Overhead or Spin Time associated with them due to function inlining without corresponding debug information. When analyzing an Intel TBB application with Intel® VTune™ Amplifier XE, we recommend that...
Intel and Third Party Tools and Libraries available with support for Intel® Xeon Phi™ Coprocessor
By BELINDA L. (Intel)Posted 10/20/20140
A number of tool vendors have announced they will be providing versions of their software tailored to supporting Intel(R) Many Integrated Core Architecture, starting with the Intel(R) Xeon Phi(tm) coprocessor.  Please contact the vendors directly for details about versions supported on Intel(R) X...
Digital Security and Surveillance on 4th generation Intel® Core™ processors Using Intel® System Studio 2015
By Naveen Gv (Intel)Posted 10/08/20140
This article presents the advantages of developing embedded digital video surveillance systems to run on 4th generation Intel® Core™ processor with Intel® HD Graphics, in combination with the Intel® System Studio 2015 software development suite. While Intel® HD Graphics is useful for developing...
Subscribe to Intel Developer Zone Articles
Notification: Update to Resource Guides for Developer and Administrator published
By Taylor Kidd (Intel) Posted on 03/26/14 0
Hi all, I just wanted to let whoever is listening that I just published updates to the Resource Guide for Intel® Xeon Phi™ Coprocessor Developers and Resource Guide for Intel® Xeon Phi™ Coprocessor Administrators documents. -- Taylor  
BKMs on the use of the SIMD directive
By Taylor Kidd (Intel) Posted on 03/25/14 0
We had an ask from one of the various “Birds of a Feather” meetings Intel® holds at venues such as at the Super Computing* (SC) and International Super Computing* (ISC) conferences. The customer wanted to know BKMs (Best Known Methods) on the proper usage of the new OpenMP* 4.0 / Intel® Cilk™ Plu...
Transactional Memory Support: the speculative_spin_rw_mutex (Community Preview Feature)
By Christopher Huson (Intel) Posted on 03/07/14 0
In a previous post I discussed the Intel® Transactional Synchronization Extensions (Intel® TSX) technology released in the new generation of processors.  I described the Intel® Threading Building Blocks (Intel® TBB) implementation of the HLE interface (speculative_spin_mutex).  Now we can talk ab...
Intel® Xeon Phi™ coprocessor Power Management Turbo Part 3: How can I design my program to make use of turbo?
By Taylor Kidd (Intel) Posted on 02/20/14 1
Previous blogs on power management and a host of other power management resources can be found in, “List of Useful Power and Power Management Articles, Blogs and References” at http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references. See [L...
Subscribe to Intel Developer Zone Blogs
'Wildhoney' - the 512bit superfast textual decompressor - some thoughts
By Georgi M.16
Hi to all. Glad I am that finally joined the Intel forum, long overdue. Here I want to share my amateurish vision on superfast textual decompression topic. For 4 months now I have been playing with my file-to-file decompressor named Nakamichi. I am on quest for writing the fastest possible variant of my approach, branchlessness combined with one only native (hifhest order) register on latest machines. This translates to 64bit/512bit mixed code. Few hours ago I wrote 'Wildhoney' variant using just that configuration. And two important things: - Nakamichi is 100% FREE - no restictions at all for modifying as the original Lempel-Ziv was; - Speed is religion, the fastestness is the ultimate goal. So far, I have written two OpenMP console tools, each enforcing 16 threads - MokujIN and Kazahana, I hope Nakamichi 'Wildhoney' to be the third. Any help in developing it I would appreciate, many basic still things I don't know. The ZMM executable with the C source is here:http://www.san...
need something like a sorted tbb::parallel_do
By foelsche@sbcglobal.net1
    from what I see there is tbb::concurrent_priority_queue.         but with this I would have to deal with thread pools myself.       is this really true?
TBB: Using task_scheduler_observer to set worker thread's OS scheduling priority
By Tim Day5
I'm looking at TBB's task_arena and task_scheduler_observer. The documentation for task_scheduler_observer sketches out a nice example of it being used to set thread affinity on worker threads to lock an arena's threads onto a subset of cores. I'm curious to know whether this class and a similar pattern could practically be used to set OS scheduling priority for an arena.  What I'm interested in doing is, on my N core HW, creating an arena with N normal worker threads, and another arena with N threads on a lower OS scheduling priority.  However, the issue with scheduler priority is that generally you only get to lower it (unless running as root, but assume not), and it's not clear to me to what extent TBB worker threads move around between arenas (which would defeat the object of keeping all the low priority threads in one arena); the task_scheduler_observer docs mention returning false from on_scheduler_leaving() to keep a thread in an arena... but also mentions the possibility of ...
API for Haswells TSX
By roberto c.2
hello, i have just begun my research focus with HTM, primarily focusing on RTM(restricted transaction memory). is there any APIs for RTM? I have looked on the internet but only the basic operands exist for RTM, such as xbegin, xend, xabort, xtest. I want to be able to access the shared memories with HTM but i can not find any library files for it.  Can you please point me in the right direction, thanks for your support.
CL_DEVICE_TYPE_CPU not working in Windows 8.1
By Yaknan G.1
Hi, I recently tried to run my OpenCL program on a new windows 8.1 computer but the program returns an error when the device type is CL_DEVICE_TYPE_CPU. When I change the device type to a CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_ ALL it ran the program on the GPU. Here is the system specification of the new computer: OS: Windows 8.1 Processor: Intel Core i7 - 4700MQ clocked at 2.40GHz Display Adapter: Intel HD Graphic 4600 and NVIDIA GeForce GT 740M How can I resolve this problem and is OpenCL having issues with windows 8.1? Please help! Yaknan
If the Policies are changed
By Luis B.0
[url=http://www.reddit.com/r/pesta3/comments/2b1ixd/]Watch British Open 2014 Live Stream WatchESPN 2nd Round free Online[/url] [url=http://www.reddit.com/r/pesta3/comments/2b19ls/]British Open Golf 2014 Live Stream Round 2 WatchESPN Online Coverage[/url]
2nd Part of the squad combination
By Mak D.0
[url=http://www.reddit.com/r/top10t2/comments/2axy97/]British open 2014 live stream open Championship Golf Watch online[/url]
Lunching several MPI processes on multicore nodes
By Dmitry K.3
Hi everyone, I have a simple issue, which must have a solution. Is it possible to assign several MPI processes to several nodes, such that first MPI process occupies full node, whereas other MPI processes are distributed on cores of the other nodes? I have an example below: On a cluster with 4 cores per node, to assign 2 MPI process to 2 nodes I do the following: #PBS -l nodes=2:ppn=4 mpirun -pernode -np 2 ./hybprog The question is how to assign 8 MPI processes to 3 nodes, such that first MPI process occupies first node, whereas other 7 MPI processes are distributed on 7 cores of the other two nodes?  Best Regards, Dmitry        
Subscribe to Forums

Highlights