Content Development Document: Threading for Performance with Intel Building Blocks

Content Development Document: Threading for Performance with Intel Building Blocks

1. Module Name: Threading for Performance with Intel Building Blocks

2. Writers: [Confidential content withheld]

3. Targeted availability: March 2009

4. Brief Module Description
Proposed duration: 2-3 hours
This concise introduction to the major components of the Intel Threading Building Blocks Library is an instructor-led course that can also function as a self-study. The course is 60% lecture (or reading) and 40% hands on lab. The course covers most of the softwares major components including: parallel algorithms, concurrent containers, task scheduling, synchronization primitives, and memory allocators.

ACM Computing Curricula 2001 Topics: PF1, PF2, PL6

5. Needs Analysis
This powerful OpenSource software was designed at Intel, and is now considered an industry staple for parallel software development. The software has a large following in the industry, and may become a de facto standard for C++ parallel programming. Anyone involved with software development in the 21st century must be at least familiar with the concepts and methods inherent in the Intel TBB. To achieve an appropriate level of familiarity of this important parallel programming model requires a technical, brief, and potentially self-paced introduction that enables students to immediately test the new ideas for specifying concurrent algorithms.

6. Subject Matter Experts (SMEs): [Confidential content withheld]

7. Learner Analysis
The ideal student for this module is an adult learner at a university, who in addition to exhibiting the learning characteristics of adult learners, has also the following traits:

  • Must have between 1 and 3 years of C++ programming experience.
  • The ideal learner will have some familiarity with parallel software models such as threading.
  • A basic familiarity of CPU architectures, and how software runs on them.
  • Has the ability to learn from lecture/discussion/hands-on lab environment; or if taking the material as a self study, is a self-directed learner.
  • Has an ability to generalize from examples.
  • Demonstrates a willingness to tackle a difficult concept and deal with complexity.
  • Must have an understanding of the issues of parallel programming
  • May be currently instructing or plan to instruct adult students who fit this learner description
  • May be currently using a successful programming curriculum, or intend to soon create or teach one.

8. Context Analysis
The purpose of a Context Analysis is to identify and describe the environmental factors that inform the design of this module. Environmental factors include:

  1. Media Selection: lecture presentation will be in Microsoft* Power Point* format including speaker notes and references to more detailed content. Lab document is provided (.doc).
  2. Learning Activities: Instructor-led or self-study lectures, with hands-on labs for both.
  3. Participant Materials and Instructor/Leader Guides: Instructor notes are included in Power Point Notes sections. Recorded presentation and lecture notes for the slides, narrated by course author, will be made available to internal Intel instructor candidates (and may be made available to external academics through the Intel Academic Community website).
  4. Transcript of expert delivery
  5. Packaging and production of training materials: Materials are posted to Intel Academic Community website, for worldwide use and alteration
  6. Training Schedule: The module is 2 hours of lecture/reading, 1 hour of lab.

9. Task Analysis
The relevant Job/Task Analysis for this material is defined by the Software Engineering Body of Knowledge (SWEBOK) and can be viewed in detail here:
The primary Bodies of Knowledge (BKs) used include, but are not limited to:

  • Software Requirements BK
  • Software Design BK
    • Key issues in Software Design (Concurrency)
    • Data persistence, etc.
  • Software Construction BK
    • Software Construction Fundamentals
    • Managing Construction
    • Practical Considerations (Coding, Construction Testing, etc.)

Relevant IEEE standards for relevant job activities include but are not limited to:

  • Standards in Construction, Coding, Construction Quality IEEE12207-95
  • (IEEE829-98) IEEE Std 829-1998, IEEE Standard for Software Test Documentation, IEEE, 1998.
  • (IEEE1008-87) IEEE Std 1008-1987 (R2003), IEEE Standard for Software Unit Testing, IEEE, 1987.
  • (IEEE1028-97) IEEE Std 1028-1997 (R2002), IEEE Standard for Software Reviews, IEEE, 1997.
  • (IEEE1517 -99) IEEE Std 1517-1999, IEEE Standard for Information Technology-Software Life Cycle Processes- Reuse Processes, IEEE, 1999.
  • (IEEE12207.0-96) IEEE/EIA 12207.0-1996//ISO/IEC12207:1995, Industry Implementation of Int. Std. ISO/IEC 12207:95, Standard for Information Technology-Software Life Cycle Processes, IEEE, 1996.

10. Concept Analysis
This module looks specifically of those aspects of CPU design that concern concurrency, including: multiple cores; memory considerations (NUMA); SIMD; simultaneous multithreading (HyperThreading).

  • Multiple core architecture and execution
  • Template and generic programming in C++
  • Design of concurrent or parallel solutions
  • Thread-safe design

11. Learning Objectives

  • Given concepts and examples from the module, students will be able to use the Intel TBB to implement data parallel solution in provided sample code.
  • Students will learn to recognize which concurrent containers would be most appropriate when working with their own code.
  • Be able to select appropriate, best synchronization objects in sample code, and apply this methodology to their own code.

12. Criterion Items
Q: What TBB parallel algorithm would be used to parallelize a loop with independent iterations?
A: parallel_for

Q: What TBB parallel algorithm would be used to parallelize the processing of nodes on a linked list?
A: parallel_do

Q: What TBB parallel algorithm would be used to find the value of largest element within a 2-D array?
A: parallel_reduce

Q: What are the three concurrent containers within the Intel TBB Library?
A: Concurrent_hash_map; concurrent_queue; concurrent_vector

Q: What are the differences between a spin_mutex and a queuing_mutex?
A: A spin_mutex spins puts the thread into a spin-wait on the mutex, and is used to protect a critical section with very few instructions. A queuing_mutex, however, will put a thread to sleep while waiting for the release of that mutex.

13. Expert Appraisal
This Content Design Document will be posted to the Intel Academic Community forum with an invitation to solicit comments from readers of the forum. Additionally, new focused SME review from outside the company will be solicited and used.

14. Developmental Testing
This release is a maintenance update. No beta version is required. The final version of the update will be posted to ISC web site before the end of March 2009 (see production).

15. Production
Upon completion and successful passing of the Product Readiness Approval in the PDT, the materials produced for this module will be posted to the Intel Academic Community (IAC) website. There they will be available for download by IAC registered participants)

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - Clay Breshears (Intel)

3. Targeted availability: March 2009

Is it already available?

Ricardo Medel
Argentina Software Design Center

Thanks for sharing !!

Leave a Comment

Please sign in to add a comment. Not a member? Join today