To All -
Below please find a beta version of the Course Design Document for a short module that introduces different implementation models and methods for parallel propgramming. ALL FEEDBACK IS FAIR GAME! Please read and comment freely.
1.Module Name: Implementation Models of Parallel Programming
2. Writers: [Intel Confidential List]
3.Targeted availability: 29 AUG 2008
4.Brief Module Description
Proposed duration: one hour.
The best methods for programming multi-core processors will be in debate and in flux for the foreseeable future. Giving students the chance to be exposed to the merits and drawbacks of many different possible models equips them with a better understanding on how to write concurrent software and how they can choose implementation strategies that will best suit their own programming requirements.
This module will offer a brief survey, comparison, and contrast of various models/methods (past, present, and current research) for writing parallel/concurrent applications. Some attention will be devoted to how these fit into programming language concepts, the types of problems that different models were designed to address, and the viability of the models on manycore architectures.
There are more parallel and concurrent programming methods besides threads and message-passing. This module will present a review of many of the well-known methods from the past, the present and current research. The focus of the module will be on principles in order to better understand the advantages and risks of each approach to implementing parallel algorithms. Practitioners of parallel programming, especially now, should not get too attached to the implementation model in front of them; they need to be aware there are choices, and these choices will evolve over time.
6.Subject Matter Experts (SMEs)
[Intel Confidential List]
The ideal student for this module is an adult learner at a university, who in addition to exhibiting the learning characteristics of adult learners, has also the following traits:
- Has been a programmer in the C/C++ or other compiled programming languages, who has at least 1 year programming experience (or the equivalent)
- Could be a freshman, or sophomore or junior level programmer (1st, 2nd or 3rd year college student), or an advanced younger student
- Is able to routinely write simple sorting or computation programs (between 10 and 100 lines) from scratch in a day or less, with no difficulty whatsoever
- In-depth knowledge of any explicit operating system or programming environment beyond being able to edit, compile and run applications is not assumed or required for this module
- An understanding of the issues of parallel programming and is at least familiar with one concurrent programming method.
- Has the ability to describe at least a high-level algorithmic solution to a given computational problem.
- May be able to devise a parallel equivalent or determine if parallelism would be applicable to portions of the algorithms used.
- Is currently or will soon be
an application developer.
- Has the ability to learn from lecture/discussion environment only.
- Has an ability to generalize from examples.
- Demonstrates a willingness to tackle a difficult concept and deal with complexity.
a.Special notes for Faculty Training learners/attendees
Faculty Training (FT) attendees are special cases wherein they likely have more experience than the usual target audience for this class, and, they have the immediate goal of teaching this class in a live classroom environment with targeted students.
Ideal FT candidates for this material have the following traits:
- Have an understanding of the issues of parallel programming and are at least familiar with one concurrent programming method
- Currently instruct or plan to instruct adult students who fit in the learner description earlier in this section
- Currently using a successful programming curriculum, or intend to soon create or teach one
The purpose of a Context Analysis is to identify and describe the environmental factors that inform the design of this module. Environmental factors include:
- Media Selection: lecture presentation will be in Microsoft* Power Point* format including speaker notes. Since there are no labs involved with this module, no lab guide or document will be provided.
- Learning Activities: Lecture-only presentation; discussion of similarities and differences between models presented is encouraged between students and between students and instructor.
- Participant Materials and Instructor/Leader Guides: Minimal instructor notes are included in Power Point Notes sections. Recorded presentation and lecture notes for the slides, narrated by course author, will be made available to internal Intel instructor candidates (and may be made available to external academics through the Intel Academic Community website).
- Packaging and production of training materials: Materials are posted to Intel Academic Community webiste, for worldwide use and alteration
- Training Schedule: The module is 1 hour of lecture.
The relevant Job/Task Analysis for this material is defined by the Software Engineering Body of Knowledge (SWEBOK) and can be viewed in detail here: http://www.swebok.org
The primary Bodies of Knowledge (BKs) used include, but are not limited to:
- Software Requirements BK
- Software Design BK
- Key issues in Software Design (Concurrency)
- Data persistence, etc.
- Software Construction BK
- Software Construction Fundamentals
- Managing Construction
- Practical Considerations (Coding, Construction Testing, etc.
Relevant IEEE standards for relevant job activities include but are not limited to:
- Standards in Construction, Coding, Construction Quality IEEE12207-95
- (IEEE829-98) IEEE Std 829-1998, IEEE Standard for Software Test Documentation, IEEE, 1998.
- (IEEE1008-87) IEEE Std 1008-1987 (R2003), IEEE Standard for Software Unit Testing, IEEE, 1987.
- (IEEE1028-97) IEEE Std 1028-1997 (R2002), IEEE Standard for Software Reviews, IEEE, 1997.
-99) IEEE Std 1517-1999, IEEE Standard for Information Technology-Software Life Cycle Processes- Reuse Processes, IEEE, 1999.
- (IEEE12207.0-96) IEEE/EIA 12207.0-1996//ISO/IEC12207:1995, Industry Implementation of Int. Std. ISO/IEC 12207:95, Standard for Information Technology-Software Life Cycle Processes, IEEE, 1996.
10. Concept Analysis
- Shared-any v. shared-none
- Major differences
- Major programming methods
- Shared-any: threads - creating and managing; explicit vs. implicit (OpenMP, TBB)
- Managed code considerations (Java, C#)
- Synchronization with shared memory; signaling (events, condition variables)
- Shared-any algorithm design and locality considerations
- Shared-none: message passing - MPI
- Other message passing models (e.g., occam, Erlang)
- Data parallel programming (C*, CM-Fortran, CUDA)
- Linda and tuple space
- Partitioned Global Address Space [PGAS] (e.g., HPF, Co-array Fortran, Unified Parallel C)
- Deterministic programming (from UIUC)
- Proposed DoD languages: Fortress, Chapel, X10 (Sun, Cray, IBM languages)
- Question: New language or add features to existing?
11. Learning Objectives
- Describe/outline/chart out/diagram the major programming differences between programming on shared-memory (shared-any) and distributed-memory (shared-none) parallel architectures as presented in lecture. These differences include creation of processes/threads, sharing of data, and synchronization of processes/threads.
- Be able to classify other programming methods described within the presentation as either shared-any or shared-none, while providing cogent arguments to support the given position in classroom discussion/in written short exam.
- Identify and describe a computational algorithm that would be well-solved by a given programming method covered in the lecture.
12. Criterion Items
Q: How is data shared between threads within a multithreaded application?
A: Data is written into an agreed upon memory location by the generating thread and then read out by the receiving thread. The order of these operations must be controlled to ensure that the write occurs before the read.
Q: How is data shared between processes within an MPI application?
A: The generating process encapsulates the data into a message that is sent to the receiving process though an API function call; the receiving process calls a receiving API call to retrieve the data and store it into local memory. There are blocking versions of the receive functions to ensure that the data has arrived before the receiving process has gotten the data before proceeding to the next statement to be executed.
Q: Should Linda be classified as a shared-any or shared-none programming paradigm? Why?
A: Shared-none. All data transfer is done through the tuple space, which is similar to a message-passing mechanism even though tuples are not addressed to specific threads.
Q: Should PGAS languages be classified as a shared-any or shared-non programming paradigm? Why?
A: Shared-any. Even though the data is distributed to threads, the data space is still shared and accessible to any thread within th
Q: Given an application that computes frames for a computer-animated film, what programming model would be best to implement the parallelism within the application? Justify the choice over some other method.
A: Answers will vary, but should include valid reasons for the choices given. Details about number of frames, drawing algorithms used, size of frames, etc. may need to be given by questioner for students to make a valid case for one method over another.
Q: Given an application that searches through a library catalog to determine the number of books that have the word loop in the title, what programming model would be best to implement the parallelism within the application? Justify the choice over some other method.
A: Answers will vary, but should include valid reasons for the choices given. Details about layout of the catalog may need to be given by questioner for students to make a valid case for one method over another. [Details of example computational problems that could be posed will be provided in instructor or student guide.]
13. Expert Appraisal
This Content Design Document will be posted to the Intel Academic Community forum with an invitation to solicit comments from readers of the forum. Additionally, new focused SME review from outside the company will be solicited and used.
14. Developmental Testing
Planned alpha and beta material will be posted to the ISC WIKI will be 22 AUG 08.
Upon completion and successful passing of the Product Readiness Approval in the PDT, the materials produced for this module will be posted to the Intel Academic Community (IAC) website. There they will be available for download by IAC registered participants. This module is expected to be taught as part of the ISC Parallel Prgramming course.