Courseware - Parallel Programming Basics

  • Introduction to parallel programming constructs
  • Design of Parallel solutions
  • Checking correctness
  • Tuning performance

 

 

MULTI CORE CHALLENGES AND STRATEGIES (National Institute of Technology Karnataka Surathkal)

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

PDF document

Location:

Go to materials

Date Added:

11/01/2011

Date Modified:

11/01/2011

Author

Prakash Raghavendra, National Institute of Technology Karnataka Surathkal
Description:

MULTI CORE CHALLENGES AND STRATEGIES

  • Increasing the Performance
  • Moore’s Law re-defined
  • The (software) challenge and problem forward!
  • A look at a few existing solution techniques

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Parallelism, Performance, Moore's Law

 

 

Basic Parallel Programming Concepts (VTU)

 

 

Material Type:

Article / White paper

ISN Logo

Technical Format:

PDF document

Location:

Go to materials

Date Added:

02/24/2011

Date Modified:

02/24/2011

Author

H.S. Jamadagni, Visvesvaraya Technological University (VTU)
Description:

The primary objective of this chapter is to make students understand the concepts of parallel programming. The chapter is dealt with as an extension to C programming so that the student develops an insight into writing parallel programs. This chapter does not intend to teach any in- depth concepts of Operating System, Computer Architecture, threading etc, but only introduces the relevant concept wherever necessary.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

threads, Parallelism, Fork Join, OpenMP

 

 

Many core processors: Opportunities and challenges

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

PDF document

Location:

Go to materials

Date Added:

10/27/2010

Date Modified:

10/27/2010

Author

Tim Mattson, Intel Visual Applications Research Laboratory
Description:

I have a lecture I give to college classes on parallel programming In it, I carefully explain the reasons behind the transition to many core chips and then discuss the need for design patterns to help us do the right things. I then close with the critical role that OpenCL plays in the future of many core chips.

Recommended Audience:

Advanced programmers, Beginning programmers, Graduate students, Secondary School students, Undergraduate students

Language:

English

Keywords:

OpenCL, Parallel, Programing, Manycore

 

 

Parallel Sparse Matrix-Vector Multiplication on multi-core computers

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

.docx

Location:

Go to materials

Date Added:

08/30/2010

Date Modified:

08/30/2010

Author

Lama Hamandi, Dept. of Electircal and Computer Engineering, American University of Beirut,
Description:

A challenge to the class: first, write the parallel implementation of the matrix-vector multiplication algorithm where a sparse matrix stored in the CRS format is multiplied by a dense vector. Use OpenMP and run it on multicore processors. Second, write hte parallel implementation of the Dot product of two dense vecors on multicore computers.

The solution set is provided with this posting.

Recommended Audience:

Beginning programmers, Secondary School students, Undergraduate students

Language:

English

Keywords:

parellel, programming, multicore, sparse, matrixvector, matrix, vector, multiplication

 

 

Multi-core Programming

 

 

Material Type:

Coding example, Lecture / Presentation

ISN Logo

Technical Format:

.htm

Location:

Go to materials

Date Added:

08/10/2010

Date Modified:

08/10/2010

Author

CST Department, Oregon Institute of Technologies
Description:

Multi-core Programming Philosophy: Multithreaded programming in a multicore environment requires more than just "spawning threads". Software engineers must be aware of multicore processor architecture, instruction pipelines and processor cache useage. In this course, approaches to analizing mulithreaded programs will be studied. The overall goal of the course is for students to gain an understanding of software issues realted to multithreaded programming and the difference between hyprethreading, multicore, and monocore systems and behavior.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

multicore processor architecture, instruction pipelines, Multithreaded programming

 

 

Brooklyn Technical High School Multi-core Bootcamp (July 21-23, 2009)

 

 

Material Type:

Lecture / Presentation, Coding example

ISN Logo

Technical Format:

.htm

Location:

Go to materials

Date Added:

08/09/2010

Date Modified:

08/09/2010

Author

Intel Corporation , Intel
Brooklyn Technical High School , Brooklyn Technical High School
Bank of America, Bank of America
Description:

Bob Chesebrough, Senior Course Architect with the Intel Academic Community built a collaborative team, with Jeffrey M. Birnbaum, and Randy Asher, Principal of the Brooklyn Technical High School and Vice President of the National Consortium for Specialized Secondary Schools of Math, Science & Technology to implement the clubhouse with top technical high school students in New York.
The 3-day boot-camp kicks off with interactive exercises using real world experiences (such as standing in lines at a store) to introduce students to parallel programming concepts including race conditions and for loops (see video guide to these challenges). Each day, students will also be given a Parallel Programming Puzzle to ponder, and discuss a real-world software developer solution the following day.

Instruction for the boot camp is being provided by Intel engineers and executives, using the latest software development tools, including Intel® Parallel Studio; and concludes with Jeffrey M. Birnbaum presenting on advanced lock-free programming techniques showcasing database applications running on a 32-core server system on loan from IBM, with high-performance Ethernet connectivity provided by BLADE Network Technologies. BLADE is hosting the servers in its world-class data center networking lab in Silicon Valley.

The curriculum is adapted from undergraduate and graduate materials created by the Intel Academic Community currently being implemented by professors at 1350 universities in 72 countries. This is the first time the curriculum has been adapted to the high school level.

Modules:
Recognizing Potential Parallelism
Understanding the Shared-Memory Model and Threads
Confronting Race Conditions
Scalability
Dependence Graphs

Labs and exercises:
Thinking Parallel Workbook
OpenMP Workbook
Threading For Performance with TBB Lab
Correcting Threading Errors with Intel® Parallel Inspector Lab
Tuning Threaded Code with Intel® Parallel Amplifier Lab
Game Threading Methodology Lab

Recommended Audience:

Advanced programmers, Graduate students, Secondary School students, Undergraduate students

Language:

English

Keywords:

interactive exercises, parallel programming concepts, high school level.

 

 

Introduction to Parallel Programming for Shared Memory Parallelism (Intel)

 

 

Material Type:

Complete material for two-day instructor-lead course

ISN Logo

Technical Format:

Power Point, PDF and Word Docs, Lab Instructions, Source Codes

Location:

Go to Materials

Date Added:

10/23/2008

Date Modified:

09/01/2009

Author

Intel® Innovative Software Education
Description:

This two-day course introduces concepts and approaches common to all implementations of parallel programming for shared-memory systems. Starting with foundation principles, topics include recognizing parallelism opportunities, dealing with sequential constructs, using threads to implement data and functional parallelism, discovering dependencies and ensuring mutual exclusion, analyzing and improving threaded performance, and choosing an appropriate threading model for implementation. The course uses presentations, walk-through labs, and hands-on lab exercises. While lab exercises are done in C using OpenMP*, the concepts apply broadly to any specific threading model.

This course is developed in collaboration with Prof Michael Quinn, Oregon State University. Prof Quinn is the author of 7 books, including Parallel Programming in C with MPI and OpenMP*, published by McGraw-Hill in June 2003.

Course Objectives

After completing this course, you should be able to:

  • Recognize opportunities for concurrency
  • Use basic implementations for domain and task parallelism
  • Address matters concerning threading correctness and performance

Course Agenda

  • Recognizing parallelism
  • Shared memory and threads
  • Implementing domain decompositions
  • Confronting race conditions
  • Implementing task decompositions
  • Analyzing parallel performance
  • Improving parallel performance
  • Choosing an appropriate thread model

Day 1 Agenda:

  • 0900 - Introductions
  • 0930 - Recognizing Potential Parallelism
  • 1100 - Shared-Memory Model and Threads
  • 1200 - Lunch
  • 1300 - Implementing Domain Decompositions
  • 1500 - Confronting Race Conditions

Day 2 Agenda:

  • 0900 - Implementing Task Decompositions
  • 1100 - Analyzing Parallel Performance
  • 1200 - Lunch
  • 1300 - Improving Parallel Performance
  • 1500 - Choosing the Appropriate Thread Model

Recommended Audience:

Undergraduate students in CS, ECE and Engineering major with computer programming experience

Language:

English

Keywords:

 

 

 

Introduction to Parallel Programming hands-on programming lab – Iterative Quicksort

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

Power Point, PDF and Word Docs, Lab Instructions, Source Codes

Location:

Go to Materials

Date Added:

04/05/2010

Date Modified:

04/05/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

This hands-on exercise lab, Iterative Quicksort, is a programming lab associated with the video lecture “Reducing Parallel Overhead” (Part 12) from the “Introduction to Parallel Programming” series. This problem seeks to parallelize an iterative implementation of the Quicksort algorithm. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video (in two parts) are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution videos running time: 19:35 (part 1), 16:40 (part 2)

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, Quicksort, iterative Quicksort, stack, simulating recursion, programming exercise

 

 

Introduction to Parallel Programming hands-on programming lab – Matrix Multiplication (Intel)

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

zip archive

Location:

Go to Materials

Date Added:

04/13/2010

Date Modified:

04/13/2010

Author

Clay Breshears, Intel Academic Community
Description:

This hands-on exercise lab, Matrix Multiplication, is one of two programming labs associated with the video lecture “OpenMP for Domain Decomposition” (Part 5) of the “Introduction to Parallel Programming” series. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution video running time: 16:56.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Parallel computing, domain decomposition, OpenMP, loop parallelism, OpenMP worksharing constructs, numerical computation, programming exercise, matrix multiplication

 

 

Introduction to Parallel Programming hands-on programming lab – Numeric Search

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

zip archive

Location:

Go to Materials

Date Added:

03/30/2010

Date Modified:

03/30/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

This hands-on exercise lab, Numeric Search, is one of two programming labs associated with the video lecture “Confronting Race Conditions” (Part 6) of the “Introduction to Parallel Programming” series. This problem examines numeric values in order to determine how many conform to a given set of properties. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution video running time: 13:29

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, OpenMP, domain decomposition, loop parallelism, data race, mutual exclusion, critical region, OpenMP critical pragma, OpenMP atomic pragma, programming exercise

 

 

Introduction to Parallel Programming hands-on programming lab – Prime Counter

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

zip archive

Location:

Go to Materials

Date Added:

04/02/2010

Date Modified:

04/02/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

This hands-on exercise lab, Prime Counter, is a programming lab associated with the video lecture “Reducing Parallel Overhead” (Part 12) from the “Introduction to Parallel Programming” series. This problem seeks to parallelize an application to count prime numbers within a given range, but do so in a less brute force way than the previous prime number finding applicaiton. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video (in two parts) are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution video running time: 16:28

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, prime number, sieve, programming exercise

 

 

Introduction to Parallel Programming hands-on programming lab – Prime Finder (Intel)

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

zip archive

Location:

Go to Materials

Date Added:

04/13/2010

Date Modified:

04/13/2010

Author

Clay Breshears, Intel Academic Community
Description:

This hands-on exercise lab, Prime Finder, is one of two programming labs associated with the video lecture “OpenMP for Domain Decomposition” (Part 5) of the “Introduction to Parallel Programming” series. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution video running time: 13:44.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Parallel computing, OpenMP, OpenMP worksharing constructs, loop parallelism, programming exercise, domain decomposition, prime numbers

 

 

Introduction to Parallel Programming hands-on programming lab – Recursive Quicksort

 

 

Material Type:

Coding example

ISN Logo

Technical Format:

zip archive

Location:

Go to Materials

Date Added:

03/31/2010

Date Modified:

03/31/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

This hands-on exercise lab, Quicksort, is a programming lab associated with the video lecture “Implementing a Task Decomposition” (Part 9) from the “Introduction to Parallel Programming” series. This problem seeks to parallelize the recursive implementation of the Quicksort algorithm with a task decomposition solution. The lab contents include source files and written instructions to guide the programmer in converting the serial source code into an equivalent parallel version using OpenMP. Solution source files and a walk through solution video are provided to explain how the initial serial codes can be transformed into equivalent parallel source codes. An explanation of the serial source code and algorithm, some potential problems to be avoided and alternative approaches to implementing parallelism are discussed within the walk through video.

Solution video running time: 20:52

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, OpenMP, task decomposition, Quicksort, recursion, recursive parallelism, OpenMP task construct, programming exercise

 

 

Introduction to Parallel Programming video lecture series – Part 01 “Why Parallel? Why Now?”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

03/08/2010

Date Modified:

03/08/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the first part in the “Introduction to Parallel Programming” video series. This part endeavors to define parallel computing, explain why parallel computing is becoming mainstream, and explain why explicit parallel programming is necessary. This part sets the tone for the other 11 parts in the series.

Running time: 9:51

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Parallel, computing, multicore, architecture, execution, optimization, parallel, performance

 

 

Introduction to Parallel Programming video lecture series – Part 02 “Parallel Decomposition Methods”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

04/06/2010

Date Modified:

04/06/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the second part in the “Introduction to Parallel Programming” video series. This part endeavors to give the viewer strategies that can identify opportunities for parallelism in code segments and applications. Three methods for dividing computation into independent work (Domain Decomposition, Task Decomposition, and Pipelining) are illustrated. The first two methods will be examined in later parts of the series and lab exercises.

Running time: 8:47

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

pipelining, domain decomposition, task decomposition

 

 

Introduction to Parallel Programming video lecture series – Part 03 “Finding Parallelism”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

03/25/2010

Date Modified:

03/25/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the third part in the “Introduction to Parallel Programming” video series. This part endeavors to explain to the viewer how dependence graphs can be used to identify opportunities for parallelism in code segments and applications. Examples of how to decide whether a domain or task decomposition will work best are offered. Code that cannot be parallelized is also able to be identified through dependence graphs. The lecture finishes with some computation examples and generalizations about problems that are more amenable to parallel solution versus problems less amenable to parallel solution as examples that viewers could apply to their own situations.

Running time: 12:24

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

domain decomposition, task decomposition, independent work, dependence graphs

 

 

Introduction to Parallel Programming video lecture series – Part 04 “Shared Memory Considerations”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

03/25/2010

Date Modified:

03/25/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the fourth part in the “Introduction to Parallel Programming” video series. This part provides the viewer with a description of the shared-memory model of parallel programming. Implementation strategies for domain decomposition and task decomposition problems using threads within a shared memory execution environment are illustrated. Simple code examples further support threaded implementations of parallel algorithms, especially with regards to deciding when variables should be shared and when variables must be made private to threads for correctness.

Running time: 14:50

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Shared memory, Shared memory model, domain decomposition, task decomposition, pipeline, private variables

 

 

Introduction to Parallel Programming video lecture series – Part 05 “OpenMP for Domain Decomposition”

 

 

Material Type:

Lecture / Presentation, Lab

ISN Logo

Technical Format:

.mp4, zip archive

Location:

Go to Materials

Date Added:

11/09/2010

Date Modified:

11/09/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the fifth part in the “Introduction to Parallel Programming” video series. This part gives an introduction to the OpenMP library for parallel programming, shows how to identify loops whose iterations can be executed in parallel, and how OpenMP pragmas can be added to execute loop iterations in parallel. OpenMP clauses to define private copies of variables and to define reduction operations on selected variables within parallel loops are also covered. A small code example to compute an approximation to the value of pi (3.1415926…) using numerical integration is included to illustrate a domain decomposition parallel solution using OpenMP.

Two programming labs (Matrix multiplication and Prime finder) with instructions and source files are separately available for practice of information and ideas presented within the lecture video.

Running time: 23:26.

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Parallel computing, domain decomposition, loop parallelism, worksharing constructs, private clause, OpenMP, reduction clause, midpoint rectangle rule, numerical integration

 

 

Introduction to Parallel Programming video lecture series – Part 06 “Confronting Race Conditions”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4, zip archive

Location:

Go to Materials

Date Added:

04/06/2010

Date Modified:

04/06/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the sixth part in the “Introduction to Parallel Programming” video series. This part provides the viewer with practical examples of errors that can occur when threads may contend for shared resources. One example is a simple arithmetic computation and another attempts to add nodes onto a linked list. Methods to eliminate data race conditions within threaded code are given in the abstract and in practical terms using two OpenMP pragmas: critical and atomic. The linked list example is used to demonstrate the importance of mutual exclusion on both reads and writes of shared data.
A programming lab (Numeric Search) with instructions and source files is separately available for practice of information and ideas presented within the lecture video.

Running time: 19:41

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

shared memory, data race, mutual exclusion, critical region, OpenMP critical pragma, OpenMP atomic pragma

 

 

Introduction to Parallel Programming video lecture series – Part 07 “Deadlock”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

03/30/2010

Date Modified:

03/30/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the seventh part in the “Introduction to Parallel Programming” video series. This part explains to the viewer the concept of deadlock between multiple threads and explains ways to prevent it. A code example with a locking hierarchy error is used to illustrate how deadlock can occur. This example is corrected after methods of deadlock prevention have been introduced.

Running time: 9:13

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

Deadlock, locking hierarchy, necessary conditions

 

 

Introduction to Parallel Programming video lecture series – Part 08 “OpenMP for Task Decomposition”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

03/31/2010

Date Modified:

03/31/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the eighth part in the “Introduction to Parallel Programming” video series. This part describes how the OpenMP task pragma works and how it is different from the previous worksharing pragmas. A small linked list processing code example is used to illustrate how independent operation within a while-loop can be parallelized. Since recursive functions, where the recursive calls are independent, can be executed in parallel, the OpenMP task construct is used to parallelize the computation of a desired member from the Fibonacci sequence.

A programming lab (recursive implementation of Quicksort) with instructions and source files are separately available for practice of information and ideas presented within the lecture video. While everything needed to succeed is presented within this lecture, it is recommended that Part 09, “Implementing a Task Decomposition,” be viewed before attempting this exercise.

Running time: 19:04.

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, task decomposition, OpenMP task construct, linked list, Fibonacci, recursive parallelism

 

 

Introduction to Parallel Programming video lecture series – Part 09 “Implementing a Task Decomposition”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

04/06/2010

Date Modified:

04/06/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the ninth part in the “Introduction to Parallel Programming” video series. This part describes how design and implement a task decomposition solution. An illustrative example for solving the 8 Queens problem is used. Multiple approaches are presented with the pros and cons for each described. After the approach is decided upon, code modifications using OpenMP are presented. Potential data race errors with a shared stack data structure holding board configurations (the tasks to be processed) are offered and a solution is found and implemented.

A programming lab (recursive implementation of Quicksort) with instructions and source files are separately available for practice of information and ideas presented within the lecture video.

Running time: 24:43

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel computing, task decomposition, eight queens problem, 8 Queens search tree, parallel search, work pool model, data race, shared stack

 

 

Introduction to Parallel Programming video lecture series – Part 10 “Predicting Parallel Performance”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

04/06/2010

Date Modified:

04/06/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the tenth part in the “Introduction to Parallel Programming” video series. This part offers definitions for the performance metrics speedup and efficiency. A fence painting example is used to illustrate how to compute these metrics. Use of Amdahl’s Law to predict maximum speedup is explained along with the derivation of the model. Explanations of why Amdahl’s Law is overly optimistic in the prediction of possible speedup are given, as well.

Running time: 15:03

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

speedup, efficiency, parallel performance, Amdahls Law, superlinear speedup

 

 

Introduction to Parallel Programming video lecture series – Part 11 “Improving Parallel Performance”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

04/06/2010

Date Modified:

04/06/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the eleventh (and penultimate) part in the “Introduction to Parallel Programming” video series. This part starts by explaining why less than optimal serial algorithms can be easier to parallelize. The concepts of temporal and data locality are defined and why maximizing these within parallel programs will pay off in performance dividends. The latter part of the lecture demonstrates how loop fusion, loop fission, and loop inversion can be used to create or improve opportunities for parallel execution. Code and pictorial examples are used to illustrate the main topics of the lecture.

Running time: 15:12

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

parallel performance, temporal locality, data locality, loop fusion, loop fission, loop inversion

 

 

Introduction to Parallel Programming video lecture series – Part 12 “Reducing Parallel Overhead”

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

.mp4

Location:

Go to Materials

Date Added:

04/02/2010

Date Modified:

04/02/2010

Author

Clay Breshears, Intel Innovative Software Education
Description:

The lecture given here is the twelfth and final part in the “Introduction to Parallel Programming” video series. This part describes the pros and cons of static versus dynamic loop scheduling with an eye toward achieving a good load balance of work per thread. The different OpenMP schedule clauses, the overheads associated with each and the situations each one is best suited for are covered. Finally, in order to reduce the amount of thread interaction (barriers, synchronization, and other overheads), the efficacy of replicating work among threads is illustrated.

Two programming labs (Prime Counter and an iterative implementation of Quicksort) with instructions and source files are separately available for practice of information and ideas presented within the lecture video.

Running time: 9:42

Note: The material presented in this lecture series has been taken from the Intel Software College multi-day seminar, “Introduction to Parallel Programming”, authored by Michael J. Quinn (Seattle University). The content has been reorganized and updated for the lectures in this series.

Recommended Audience:

Beginning programmers, Undergraduate students

Language:

English

Keywords:

static schedule, dynamic schedule, parallel overhead, OpenMP schedule clause, load balance, work replication

 

 

Introduction to Parallel Programming with Java

 

 

Material Type:

Lecture / Presentation, Coding example, Workshop and Training Materials, Lab

ISN Logo

Technical Format:

Powerpoint presentation, Word document

Location:

Go to Materials

Date Added:

08/02/2010

Date Modified:

08/02/2010

Author

Selwyn H You, Intel Academic Community
Description:

Develop programs that take advantage of multi-core platforms by applying fundamental concepts of parallel programming.

After completing this course, you will be able to:

  • Recognize opportunities for parallel computing
  • Use basic implementations for domain and task parallelism
  • Ensure correctness by identifying and resolving race conditions and deadlocks
  • Improve performance by selective code modifications and load balancing

Recommended Audience:

Advanced programmers, Beginning programmers, Graduate students, Undergraduate students

Language:

English

Keywords:

parallel computing, Java, domain and task parallelism, race conditions and deadlocks

 

 

Introduction to Parallel Programming: Threading Strategies (Intel)

 

 

Material Type:

Lecture / Presentation

ISN Logo

Technical Format:

Powerpoint presentation, .docx

Location:

Go to Materials

Date Added:

04/13/2010

Date Modified:

04/13/2010

Author

Intel® Innovative Software Education ,
Description:

This module introduces concepts and industry-standard methodologies common to the implementation of three threading strategies: domain decomposition, task decomposition, pipeline decomposition.

Students learn to construct a dependency graph to analyze which of the three threading strategies might best be applied. They also learn how to identify race conditions.

Students learn basic principles of parallel programming on shared-memory systems, including how to recognize key areas of code that are good candidates for parallelism.

Topics include:

  • Recognizing opportunities for parallelism
  • Preventing potential synchronization issues
  • Domain decomposition
  • Task decomposition
  • Pipelining
  • Analyzing and improving performance
  • Choosing the appropriate threading model

Recommended Audience:

Undergraduate students

Language:

English

Keywords:

Threading Strategies
有关编译器优化的更完整信息,请参阅优化通知
类别: