Q&A from Webcast “The Simplifying Parallelism Implementation with Intel® Threading Building Blocks”

Q&A from Webcast "The Simplifying Parallelism Implementation with Intel® Threading Building Blocks" presented by Michael D'Mello on 5/26/2009

Q: Is the thread-to-core ratio hardwired 1:1 in Intel® Threading Building Blocks (Intel® TBB) or is it configurable?
A: The number of worker threads in Intel TBB thread pool equals the number of logical cores by default. The default number can be changed: you can specify the desired number of threads as a parameter to the constructor of Intel TBB initialization object task_scheduler_init. Please visit http://www.threadingbuildingblocks.org/documentation.php and read the related chapter in the tutorial for the complete example.

Q: What if my code contains data-dependency? Can Intel® Threading Building Blocks (Intel® TBB) detect it?
A: No, Intel TBB doesn't analyze code to detect data dependencies. Intel TBB is a library that provides generic algorithms and data structures that simplify threading and it is recommended that all data dependencies are known before applying threading. Data races arguably constitute the most commonly encountered errors in parallel code, and this type of error occurs when data dependencies are not properly handled by the programmer. Intel® Parallel Inspector is a developer oriented tool for finding data races in threaded applications.

Q: What is the advantage of re-implementing ChangeArray(A) as a functor?
A: Most of Intel® Threading Building Blocks (Intel® TBB) algorithms take a functor object as a parameter. To take advantage of easy threading and "future proof" parallelism, which Intel TBB algorithms deliver, a developer should implement the logic of a parallel task as a functor. A less highly touted feature of Intel TBB is that the library provides excellent mechanisms to handle task-based parallelism while still emphasizing the data parallelism model. For simple loops, one may well question the advantages of using Intel TBB as opposed to other threading models. However, for more complicated loops, Intel TBB seems to provide much more flexibility than other threading models. One example of this is to consider iterating over a collection of items which is not indexed by an integer.

Q: What do you mean by a "warmer" task describing the effectiveness of task scheduling performed by Intel® Threading Building Blocks (Intel® TBB)?
A: A warmer task is a task which is most likely to still reside in the cache. Executing "warm" tasks while they are still in the cache is more efficient than executing "cold" tasks whose data will have to be fetched into the cache first. Intel TBB task scheduler's approach to scheduling tasks favors "warm" tasks over "cold" tasks which makes scheduling highly efficient.

Q: Just thinking about use of cache_aligned_allocator vs. scalable_allocator - does Intel® Threading Building Blocks (Intel® TBB) provide an API for obtaining any metrics about the processor caches/cache-lines etc?
A: No. That type of information can be examined using the Intel® VTune Performance Analyzer, which can monitor processor-specific events including cache misses and other cache-related activity.

Q: Your examples are for the for-loop, is it also true for while-loop? How do I parallelize non-indexed loops with Intel® Threading Building Blocks (Intel® TBB)?
A: Intel TBB implements several template classes and functions to simplify threading of non-indexed loops. For example, parallel_do (for while-loops) and pipeline (data flow pipelines). Please see the Intel TBB documentation to learn more: http://www.threadingbuildingblocks.org/documentation.php.

Q: Does Intel® Parallel Studio require a C++ compiler or is one included?
A: Intel® C++ Compiler is included in Intel Parallel Studio; it is one of the components of Intel® Parallel Composer.

Q: Which versions of Microsoft Visual Studio* are supported by Intel® Parallel Studio?
A: Microsoft Visual Studio* 2005 and 2008.

Q: What do I gain when I compile the source code with Intel® C++ Compiler compared to Microsoft's compiler? If there is a gain, several projects are impossible to convert to Intel® Compiler, such as COM-projects using attributes. And projects using BOOST are cumbersome since I need to compile BOOST*-versions for the Intel Compiler. It would be great if Intel provided already compiled BOOST library dlls and libs?
A: That is correct. COM attributes are not supported right now. We don't provide special builds of 3rd party libraries but it should be easy to compile them with Intel C++ Compiler. Also, binaries built with Intel C++ Compiler are fully compatible with binaries built with Microsoft's compiler so both types of binaries can be safely mixed within one application. Therefore, you should be able to link object files created with the Intel compiler and your source code with the BOOST libraries created with the Microsoft compiler.  If you have any issues compiling BOOST with Intel C++ Compiler, please let us know via a forum: /en-us/forums.

Q: I know Intel® Threading Building Blocks (Intel® TBB) works with AMD processors, will Intel® Parallel Studio?
A: Intel® Parallel Studio runs on platforms with an IA-32 or Intel® 64 architecture processor supporting the Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instructions (Intel® Pentium 4 processor or later, or compatible non-Intel processor).

Q: Does Intel® Parallel Studio work under Microsoft Windows* 7 and Visual Studio* 10?
A: The first releases of Intel Parallel Studio have not been validated against Microsoft Windows* 7 and Visual Studio* 10, since neither is a released product. We will, of course, move forward closely aligned with the Microsoft roadmap and validate those platforms in future releases.

Q: Is Intel® Threading Building Blocks (Intel® TBB) aware of dynamic logical partitioning activity like 2 cores out of say 8 being pulled out during application-runtime?
A: Intel TBB does not have any dynamic mechanisms to detect such situations. The number of threads is selected during initialization of the Intel TBB task scheduler. The user either chooses how many threads will be created or allows Intel TBB to take the default (which is 1 thread per logical core). For the situation you mention, provided the logical partitioning mechanism does not interrupt normal code execution (due to some system level dependency), Intel TBB code will continue to execute with a thread pool of size determined at time of the scheduler initialization.


Per informazioni più dettagliate sulle ottimizzazioni basate su compilatore, vedere il nostro Avviso sull'ottimizzazione.