English | 中文 | Русский | Français
2,856 Posts served
8,606 Conversations started
Since the kickoff of the High School Parallelism bootcamp this summer, I've received several requests for a write up of the five role playing activities we used.
The activities put students in the place of procesor cores and had them perform tasks in parallel. These activities proved to be popular among many of the students at the camp, however, some of the more advanced students did express that they felt the exercsies could seem childish.
My personal observation is that these exercises laid an excellent foundation that was built upon later with actual computer lab activites using OpenMP and Threading Building Blocks.
The activities are best done in groups of 4 or 5 individuals but even two in a single group can.
Without further ado - here is my promised write up:
Thinking Parallel
These role playing activities are designed to get you start thinking in parallel. They will also expose you to some terminology & concepts that we will use throughout the rest of the boot-camp. Many of these exercises were inspired from Chapter two if James Reinders’ book “Intel Threading Building Blocks”.
Objectives
The objectives for the following five exercises are to:
1) Explore domain & task decomposition using a mailer example.
2) Learn about race conditions and two ways to mitigate them.
3) Learn what a critical section does
4) How a reduction can eliminate a race condition.
In this activity, each member of your team or group will play the role of a processor core, or more accurately, the role of a thread executing code on a core. Your team will explore how to use domain decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.
Time Required
fifteen minutes
Objective
Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism
through domain decomposition.
Materials
1) 16 envelopes per table of 4 to 5 “processors”
2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)
3) Pens or pencils
Setup
1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share
2) Each “processor” (think student) must read over and become familiar with the “code” or instructions she is to execute so that the “code” is committed to memory.
Here is your “code” of instructions – For Domain decomposition – each processor does the same task but uses different data (represented here by different addresses)
a) Fold scrap paper
b) Stuff paper into envelop
c) Pretend to seal envelope (so we can re-use them later)
d) Address an “address label” and fix it to the middle of the envelope
You may use these fictitious addresses if you cannot come up with 16 of your own:
|
1600 Daily Planet Ave Metropolis, NY, 12345
|
Gotham City College Gotham, IL, 60506
|
Kwikspell University Briton, UK
|
Know-It-All university Bullwinkle, AK
|
|
Waco University Waco, Texas, 12987
|
Acme Looniverisity Toonville, CA, 10023
|
Bronto Crane Academy Rock Vegas, NV, 010101
|
Ferris Bueller Burbank, CA
|
|
CXVI Caesar Av. Rome, Italy, XLIV BC
|
Central High School Central, IN, 17766
|
Bullworth Academy Missions Blvd, Bermuda
|
Boston Bay College Dawsons Creek, IL, 10982
|
|
1600+1/2 Pennsylvania Washington, DC 20500
|
Bedrock High School Little Rock, AR, 56005
|
Watsamata University Rocky Road, AK
|
Hill Valley Schoolhouse Hill Valley, CA, 33773
|
e) Fix another post-it (representing a stamp) to the stamp area of the envelope
f) Place envelope in table area designated “the mail box”
Monitor Time for completion of mailer exercise
1) Have someone on the team write down the time just prior to saying go
2) Each “processor” should complete his assigned set of envelopes as quickly as possible.
3) Record the time to complete the assigned mailer job. Time: _____________________
4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?
In this activity, your team will explore how to use a task decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.
Time Required
fifteen minutes
Objective
Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism
through task decomposition.
Materials
1) 16 envelopes per table of 4 to 5 “processors”
2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)
3) Pens or pencils
Setup
1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share
a) Each “processor” will agree with team ahead of time which task or tasks he/she is committed
to accomplishing. Perhaps, one processor is assigned the tasks of Folding, Stuffing, Pretend
Sealing all the envelopes.
b) Another processor, or perhaps even two, are assigned the task of addressing the envelopes
Use same addresses as before
c) Another processor is assigned the role of stamping and mailing the envelopes
Monitor Time for completion of mailer exercise
1) Have someone on the team write down the time just prior to saying go
2) Each “processor” should complete his assigned tasks as quickly as possible.
3) Record the time to complete the assigned mailer job. Time: _____________________
4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?
In this activity, your team will explore how to use a domain decomposition to accomplish the job of (mis)adding a set of numbers which we will call a vector. The activity exposes the problem that occurs when writes to a shared memory variable are not protected by a synchronization construct.
Time Required
fifteen minutes
Objective
Use some index cards, and pencils to explore the concept of a race condition.
Materials
1) 16 numbered index cards.
2) One index card labeled “Shared Sum”
3) 16 extra index cards are labeled “local memory”
4) Pencils
Setup
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”
2) One extra index card is labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors
3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.
Execution
1) This is a domain decomposition exercise, where each processor “reads” the value of the “shared sum” and writes a new value to this shared sum – ignoring all the other processors previously written values – this is called a race condition.
Each processor should do these steps as quickly as possible:
a) “read” the value of the “shared sum” and writes that number on your own scratch pad.
b) add one of your “vector” card’s values to the sum on your scratch pad.
c) Immediately cross off the current value on the “shared sum” card and write your own value on the card (probably stomping over someone else’s value).
d) Repeat steps 5a – 5c until you are out of index cards
2) Compare the final value written on the “shared sum” with the known total (46)
3) Did your team compute the correct grand total for the vector sum?
In this activity, your team will explore how a critical section can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also expose the performance penalty that can be taken as a result of critical sections.
Time Required
Fifteen minutes
Objective
Use some index cards, a magic marker & pencils to explore the concept of a critical section
Materials
1) 16 numbered index cards (we call it our vector)
2) 1 index card labeled “Shared Sum”
3) 16 extra index cards are labeled “local memory”
4) Magic marker
5) Pencils
Setup
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”
2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.
3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.
4) A Magic maker, known as “critical section”, is placed on the table next to the “shared sum”
Execution
1) We are now trying to synchronize access to shared memory by implementing a critical section. Our goal is to get rid of the race condition we encountered earlier.
2) New rule: “Each processor can only use the magic marker, named critical section, to write values to the “shared sum” index card. And – No processor can do any computations without first acquiring the magic marker, called critical section. As soon as a processor writes a new value to “shared sum” the processor should expeditiously return the capped critical section to the middle of the table”.
3) Each processor should do these steps as quickly as possible:
a) Acquire the critical section! If you failed to acquire the critical section you must wait for the marker to be placed back in the middle of the table.
b) Immediately cross off the current value on the “shared sum” card
c) Add the value of one of your index cards to the “shared sum” use a scratch pad if needed
d) Write the new value for sum on the globally shared “shared sum” card
e) Return the cap to the marker
f) Return the marker to the middle of the table
g) Repeat steps 5a(i) – 5a(v) until you are out of index cards
4) Compare the final value written on the “shared sum” with the known total (46)
5) Did your team compute the correct grand total for the vector sum?
6) What did you observe about how much time you spent idling versus time spent writing or calculating?
In this activity, your team will explore how a reduction (or a partial sums approach) can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also demonstrate the benefit of replacing a critical section with a reduction wherever possible.
Time Required
Fifteen minutes
Objective
Use some index cards, & pencils to explore the concept of a critical section
Materials
1) 16 numbered index cards (we call it our vector)
2) 1 index card labeled “Shared Sum”
3) 4 index card labeled “Partial Sum”
4) 16 extra index cards are labeled “local memory”
5) Pencils
Setup
1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”
2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.
3) The remaining “Partial sum” cards are divvied up among the processors
4) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.
Execution
We are now trying to synchronize access to shared memory by implementing a reduction – which amounts to a collection of partial sums computed by each processor, followed by a grand total computed by a master thread. Our goal is to get rid of the race condition we encountered earlier, and do the parallel tasks in a more efficient manner.
Every processor will add up his/her own partial sum that represents the total of all the vector
elements assigned to him/her.
One processor will also be named to execute the “master thread” that adds up all the partial sum cards to create a grand total , that the master thread writes this grand total to the “shared sum” card.
1) Each processor should:
a) Add the values from all their assigned vector cards
b) Write the value to a “partial Sum” card
c) Give the “Partial Sum” card to the Processor who will execute the master thread
2) The Master Thread only should:
a) Compute his own partial sum
b) Wait for all other partial sums to arrive
c) Compute the grand total for all partial sums
d) Write the grand total on the “Shared Sum” card
3) Did your team compute the correct grand total for the vector sum?
What did you observe about how much time you spent idling versus time spent writing or calculating?
4) How effective was the reduction strategy at computing the sum in parallel versus the other methods tried?
Review Questions
Question 1: Describe a race condition
Question 2: What does a critical section do?
| November 7, 2009 3:33 AM PST
ninhngt
| Nice exercises ! It is good to see parallelism in action in education. |

Paul Steinberg (Intel)
7,225
Status Points:
7,225
Thanks for posting this. The BTHS project was a bold (and successful) experiment - I am proud that we helped lead the way. I think that these exercises should be quite helpful to folks and I am anxious to read feedback from members of the Academic Community.
For those interested in further conversations regarding parallelism, computational thinking & secondary education, I invite them to watch our Teach Parallel broadcasts on the subject: “Are High School Whiz Kids Ready to Think Parallel?", “Teaching High School students to Think Parallel: Brooklyn Technical High School Follow-up” and “Developing Computational Thinking in High School Science and Mathematics.” All of these broadcasts are available archived at http://www.intel.com/software/teachparallel.
Paul