This page contains common questions and answers on multi-threading in the Intel IPP.
Optimization
Parallel PHP (HipHop) using TBB, Kiwi Style
I've been chatting with a small group of dedicated fans of Intel Threading Building Blocks (TBB) in New Zealand. They've been looking at adding parallelism, using TBB, to Wordpress, PHP, HipHop, Perl, and other open source projects. They have published their code and some interesting results. They have a web site http://openparallel.com explaining some of their work.
The hidden performance cost of accessing thread-local variables
Ever finished parallelizing a code and discovered that the performance was not what you were expecting? I think that has happened to everyone. One of the tricks I’ve recently learned is that it is a good idea to start the code optimization by running Intel® VTune™ Amplifier XE Lightweight Hotspots analysis, which shows function hot spots of an application (shows clock ticks and instructions retired). Unlike precise call graph analysis, Intel® VTune™ Amplifier XE Lightweight Hotspots analysis is very fast, and does not instrument your application.
Intel® ArBB Videos, Tutorials and Webinars
This video archive helps developers quickly get up to speed on basic Intel® ArBB topics and advanced coding techniques.
Optimizing Software Applications for NUMA: Part 1 (of 7)
1. The Basics of NUMA
NUMA, or Non-Uniform Memory Access, is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access.
In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:
NUMA, or Non-Uniform Memory Access, is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system. Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access.
In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:
Using Intel® Parallel Advisor 2011 to determine if your Intel® Threading Building Blocks application will scale
I have a new appreciation for the Suitability tool in Intel® Parallel Advisor. Intel Parallel Advisor was created to help us add parallelism to existing serial code, but I’ve discovered another useful, possibly unconventional, use for Intel Parallel Advisor with my parallel application. I’ve discovered that I can use Intel Parallel Advisor to collect valuable performance and scalability information about my parallel application that would be difficult to collect otherwise.
Is AVX enabled?
If we ask anyone who uses or plans to use or just advertises the intrinsic compiler functions for SIMD support (MMX, SSE, AVX): why do you do so, why it is good? The answer definitely will be something like this:"Intrinsics provide a C/C++ language interface to assembly instructions, so that we don't need to deal with assembler".
Distributed Memory Coarray Fortran with the Intel Fortran Compiler for Linux: Essential Guide
An essential getting started guide for using Intel Coarray Fortran for Linux on a distributed memory cluster.

