Intel® Developer Zone:
Performance

Highlights

Just published! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Learn the essentials of programming for this new architecture and new products. New!
Intel® System Studio
The Intel® System Studio is a comprehensive integrated software development tool suite solution that can Accelerate Time to Market, Strengthen System Reliability & Boost Power Efficiency and Performance. New!
In case you missed it - 2-day Live Webinar Playback
Introduction to High Performance Application Development for Intel® Xeon & Intel® Xeon Phi™ Coprocessors.
Structured Parallel Programming
Authors Michael McCool, Arch D. Robison, and James Reinders uses an approach based on structured patterns which should make the subject accessible to every software developer.

Deliver your best application performance for your customers through parallel programming with the help of Intel’s innovative resources.

Development Resources


Development Tools

 

Intel® Parallel Studio XE ›

Bringing simplified, end-to-end parallelism to Microsoft Visual Studio* C/C++ developers, Intel® Parallel Studio XE provides advanced tools to optimize client applications for multi-core and manycore.

Intel® Software Development Products

Explore all tools the help you optimize for Intel architecture. Select tools are available for a free 30-day evaluation period.

Tools Knowledge Base

Find guides and support information for Intel tools.

Courseware - Software Processes
By adminPosted 02/27/20150
Software life-cycle and process models Software process capability maturity models Approaches to process improvement Process assessment models Software process measurements     CSE445/598 Project on Multithreading and Multi-Core Processing (ASU)     Material Type: Problem set...
Courseware - Data Structures
By adminPosted 02/27/20150
Representation of numeric data Range, precision, and rounding errors Arrays Representation of character data Strings and string processing Runtime storage management Pointers and references Linked structures Implementation strategies for stacks, queues, and hash tables Implementation str...
Intel® Parallel Studio XE 2016 Beta
By Gergana Slavova (Intel)Posted 02/27/20150
Contents What's New Overview License changes in 2016 product Check out the full What's New Technical Document Details Frequently Asked Questions Beta duration and schedule Support How to enroll in the Beta program Beta Webinars Beta Release Notes Special Features and ...
Courseware - Algorithmic Problem Solving
By adminPosted 02/26/20150
Problem-solving strategies The role of algorithms in the problem-solving process Implementation strategies for algorithms Debugging strategies The concept and properties of algorithms     Animated Game Design in Engineering Design Process (ASU)     Material Type: Lecture / Pr...
Subscribe to Intel Developer Zone Articles
Web Resources about Intel® Transactional Synchronization Extensions
By Roman Dementiev (Intel) Posted on 07/28/14 3
Short URL for this page: www.intel.com/software/tsx In this blog I list useful technical resources related to Intel® Transactional Synchronization Extensions (Intel TSX). I will try to keep the list up-to-date as new material becomes available (subscribe to this page below to get update notifica...
Additional AVX-512 instructions
By James Reinders (Intel) Posted on 07/17/14 1
Additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) The Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. As I discussed in my first blog about Intel® AVX-...
Using Intel® TSX with VTune(TM) Amplifier XE 2015 Beta to measure transaction time & abort in your code?
By Peter Wang (Intel) Posted on 07/12/14 2
When the user develops multithreaded applications, the user should protect critical (sensitive) code area called by threads, so threads access shared memory without data conflict. Most of time, the user might use critical_section, mutex, semaphore, atomic, events, or other “locks” to protect crit...
Compete And Win A Prize With The New Intel® CnC!
By Frank Schlimbach (Intel) Posted on 07/10/14 0
A new version if Intel® Concurrent Collections for C++ (CnC) has been released. We are celebrating its coming out to open source with a programming contest, which will have its showdown at the 6th annual CnC workshop. The organizers call on individuals and small teams to compete for a significant...
Subscribe to Intel Developer Zone Blogs
OpenMP 4.0 task depend too limited would TBB be better?
By Nicholas B.0
Hello I have been looking at task depend in OpenMP 4.0 but it looks like it is too limited for what I want to do. To do what I want it would need to take a vector subscript in the array section in the depend clause. My code would look something like ths: type cell_type ... contains procedure :: process end type cell_type type(cell_type), dimension(n) :: cells type edge_type integer, dimension(:), allocatable :: icells ... contains procedure :: process end type edge_type type(edge_type), dimension(m) :: edges ! a bit like a c++ std::vector<std::vector<int>> edges(1)%icells = [1, 5, 7, 8, 100] ! edge 1 depends on cells 1, 5, 7, 8 and 100 edges(2)%icells = [1, 2, 4] ! edge 2 depends on cells 1, 2 and 4 ... do i=1,n !$omp task depend(out:cells(i)) call cells(i)%process() !$omp end task end do do j=1,m ! next line not allowed !$omp task depend(out:edges(j)) depend(in:cells(edges(j)%icells)) call edges(j)%process(cells) !$omp end task end d...
Nested OMP on Xeon Phi using OMP4
By james B.3
Xeon Phi has 60 cores and 4 threads per core. I am writing an experiment that will have 1 master thread on each core, and each of these will spawn  4 slave threads. Looking at the manual https://software.intel.com/en-us/node/512835 it seems that I want to set the envars: MIC_OMP_NESTED=TRUE MIC_OMP_PROC_BIND="spread, close" MIC_OMP_NUM_THREADS=60Is this correct? I've tested this and it doesn't die... Is there a way I can get the runtime to spitout affinity debug info about where it is actually placing things so I can be certain? Cheers, James
Slowdown with OpenMP
By Matt S.11
I'm getting some pretty unusual results from using OpenMP on a fractional differential equations code written in fortran. No matter where I use OpenMP in the code, whether it be on an intilization loop or on a computational loop, I get a slowdown across the entire code. I can put OpenMP in one loop and it will slow down an unrelated one (timed seperately)! The code is a bit unusual, as it initalizes arrays starting at 0 (and some even negative). For example, real*8 :: gx(0:Nx) real*8 :: AxLh(1-Nx:Nx-1), AxRh(1-Nx:Nx-1), AxL0(1-Nx:Nx-1), AxR0(1-Nx:Nx-1) Where Nx is, let's say, 512. Would that possibly have anything to do with the ubiquitous slowdown with OpenMP? Also, any ideas on reducing "pow" overhead in the following snippet would be greatly appreciated do k = 1, 5 hgck = foo_c(k) hgpk = foo_p(k) do j = 1, 100 vx = vx + hgck * ux(x, t, foo(j) + hgpk) end do end do where ux is a function defined by function ux(x,t,xi) impl...
web crawling through &quot;Intel Xeon Phi Coprocessors&quot;
By Sunil K.1
I am new to this forum. I want to implement parallel crawling on "Intel Xeon Phi Coprocessors" as for my project. Before buying equipment, installing software and start learning about this platform I want to know that whether it is possible to somehow connect to Network and get web URLs in parallel using this technology? (I don't want to create cluster of CPUs to do. I want to do it using single card).
Intel MPI for Phi tuning tips?
By Ronald W Green (Intel)3
Does setting     I_MPI_MIC=enable change other MPI environment variables, particularly any that would tune MPI for the MIC system architecture?   As a side question, has anyone written a Tuning and Tweaking guide for IMPI for Phi?  For example, what I_MPI variables could one use to help tune an app targeting 480 ranks across 8 Phis? Thanks Ron
Lock-free Java, or better scaling on multi-core systems
By William L.0
Everyone these days has to address multi-core issues, or vertical scaling, at least on the server-side of things. And there does not seem to be a general approach, so we end up re-architecting our applications every time we add cores. At the same time, the availability of many-core processors seems to be constrained by the lack of a reasonable software technology to make good use of them. Actors seems like a good approach, and allow you to write fast, lock-free code. But large actor-based systems are not robust. Most actor implementations require applications to implement a state machine per actor for determining what messages are to be processed, and maintaining a large number of interacting state machines is well beyond the abilities of most developers. Which is very sad, as throughput of actor-based applications typically scales with the number of cores. I've worked on this problem for a number of years now and have developed a simple variation on actors which support non-blockin...
igzip for VS10 C++?
By David L.6
I was searching for a zlib-compatible compressor but faster, and came cross the paper describing igzip -- High Performance DEFLATE Compression on Intel Architecture Processors igzip looks like exactly (!) what I am looking for.  Compatible with zlib, but faster. However, the downloadable source was for Linux.  I need it for a VS10 C++ project.  I have successfully (I think) compiled and assembled the desired modules (common, crc, crc_utils, hufftables, hufftables_c.cpp, igzip0c_body, igzip0c_finish, init_stream) into a .lib.  But when I attempt to link the library into my project, I get error LNK2019: unresolved external symbol fast_lz (and init_stream) from where they are called.  I also have a "C" lz4 compression library linked into the project, and it works fine.  I have spent 3 days playing with it, looking for the clue that will unlock the symbols, but no luck so far. I get no other warnings and/or errors during the compiling/assembling of the library or project.  Any help (esp...
OpenCL vs Intel Cilk Plus Issues, Differences and Capabilities
By Yaknan G.0
I  am curious as to the differences between OpenCL and Intel Cilk Plus. They are both parallel programming paradigms that are receiving wide recognition but technically speaking is one better than the other or are they simply different. Also what yardstick do I use when choosing between the two when solving an embarrassingly parallel problem. Please i need answers. Thanks! Yaknan
Subscribe to Forums

Highlights