Multi-Threaded Programming: Advanced Techniques

by Aaron Coday


In this course and its accompanying labs, you will become familiar with intermediate to advanced techniques for explicit threading and OpenMP* threading. You’ll demonstrate your understanding of explicit threading by adding GUI responsiveness to the Apfel* application. You’ll demonstrate your understanding of OpenMP threading by improving the performance of the fractal calculation that the program is executing.

To complete the Advanced Multi-Threading labs included in this course, you’ll need the following tools:


Download the Apfel source code ( and other files you will need for the labs. Please download and extract these files before continuing with the course.

This course is divided into two parts, each structured around one of the multi-threading labs. Each section starts with a description of the multi-threading problem to be addressed, followed by detailed instructions as to a possible solution. Each of the two sections then concludes with a lab activity, in which you will implement the proposed multi-threading solution.


Click to enlarge

Part 1: Responsiveness


  • Launch Microsoft Visual Studio.
  • Open workspace from C:LabCONTESTapfelapfel.dsw.
  • Make sure that Intel Compiler is selected.
  • Build the application by selecting Release build and then Build (F7).
  • Press Ctrl-F5 to run the application.


Adding thread function to CApfelRun

Basically you add a thread function to CApfelRun and then take care of starting and passing the necessary information into the thread. The new thread is responsible for performing the DoRun method.

Part 2: Performance


You can use multi-threading to add extra functionality, to increase performance, or both. You should know and be able to use both explicit threading (Win32*) and OpenMP*.

Supplemental Material

Intel® Threading Toolkit

  • Intel® Thread Checker
  • Thread Profiler
  • Intel® VTune™ Performance Analyzer

Intel® Thread Checker

  • Locate threading bugs in applications on IA-32 systems running Windows*
  • Use remote collectors to locate threading bugs in applications on IA-32 and Itanium®-based systems running Linux*.

Running Intel® Thread Checker

Statistics collected within VTune™ analyzer

  • Compile with icl /Qopenmp_profile (/MD /Qopenmp)

Statistics collected outside VTune analyzer

  • Compile with icl /Qopenmp_profile
  • Run program outside VTune environment
  • Import guide.gvs statistics file into VTune analyzer

To import guide.gvs files, simply do File/Open File for OpenMP Statistics (*.gvs) files.

Thread Profiler

  • For Windows*, locate performance bottlenecks in Win32* and OpenMP* threaded applications
  • For Linux*, now you can locate performance bottlenecks in POSIX* and OpenMP threaded applications, from a host Windows system
  • View graphic displays that show each thread's state and parallel-serial transitions to confirm that performance is meeting expectations or where it is falling short - helps you decide where to focus optimization efforts

Intel® VTune™ Performance Analyzer

Error List

  • Customizable
  • Links to source view

Source View

  • Error context
  • Error locations
  • Stack trace


Appendix – Win32 Threads

The following is a review of common Win32 threading functions.

Creating Win32* Threads

Waiting for Kernel Objects

This is the hub function for synchronization.

DWORD WaitForSingleObject (
HANDLE hHandle,
DWORD dwMilliseconds);
// Timeout (0 .. INFINITE)



HANDLE CWnd->PostMessage(
UINT message,
// Message (WM_DONE)
WPARAM wParam,
LPARAM lParam );
// Additional Message info



Download the PDF (364KB)

For more complete information about compiler optimizations, see our Optimization Notice.