Five role playing exercises to introduce parallelism concepts

Since the kickoff of the High School Parallelism bootcamp this summer, I've received several requests for a write up of the five role playing activities we used.


The activities put students in the place of procesor cores and had them perform tasks in parallel. These activities proved to be popular among many of the students at the camp, however, some of the more advanced students did express that they felt the exercsies could seem childish.


My personal observation is that these exercises laid an excellent foundation that was built upon later with actual computer lab activites using OpenMP and Threading Building Blocks.


The activities are best done in groups of 4 or 5 individuals but even two in a single group can.


Without further ado - here is my promised write up:



Thinking Parallel


These role playing activities are designed to get you start thinking in parallel. They will also expose you to some terminology & concepts that we will use throughout the rest of the boot-camp. Many of these exercises were inspired from Chapter two if James Reinders’ book “Intel Threading Building Blocks”.


Objectives

The objectives for the following five exercises are to:

1) Explore domain & task decomposition using a mailer example.

2) Learn about race conditions and two ways to mitigate them.

3) Learn what a critical section does

4) How a reduction can eliminate a race condition.


Activity 1 – Explore Domain Decomposition

In this activity, each member of your team or group will play the role of a processor core, or more accurately, the role of a thread executing code on a core. Your team will explore how to use domain decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.


Time Required

fifteen minutes


Objective

Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism

through domain decomposition.


Materials

1) 16 envelopes per table of 4 to 5 “processors”

2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)

3) Pens or pencils


Setup

1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share

2) Each “processor” (think student) must read over and become familiar with the “code” or instructions she is to execute so that the “code” is committed to memory.

Here is your “code” of instructions – For Domain decomposition – each processor does the same task but uses different data (represented here by different addresses)

a) Fold scrap paper

b) Stuff paper into envelop

c) Pretend to seal envelope (so we can re-use them later)

d) Address an “address label” and fix it to the middle of the envelope

You may use these fictitious addresses if you cannot come up with 16 of your own:




























1600 Daily Planet Ave


Metropolis, NY, 12345


 



Gotham City College


Gotham, IL, 60506


 



Kwikspell University


Briton, UK


 



Know-It-All university


Bullwinkle, AK


 



Waco University


Waco, Texas, 12987


 



Acme Looniverisity


Toonville, CA, 10023


 



Bronto Crane Academy


Rock Vegas, NV, 010101


 



Ferris Bueller

School


Burbank, CA


 



CXVI Caesar Av.


Rome, Italy, XLIV BC


 



Central High School


Central, IN, 17766


 



Bullworth Academy


Missions Blvd, Bermuda


 



Boston Bay College


Dawsons Creek, IL, 10982


 



1600+1/2 Pennsylvania

Avenue NW


Washington, DC 20500


 



Bedrock High School


Little Rock, AR, 56005


 



Watsamata University


Rocky Road, AK


 



Hill Valley Schoolhouse


Hill Valley, CA, 33773


 



e) Fix another post-it (representing a stamp) to the stamp area of the envelope

f) Place envelope in table area designated “the mail box”


Monitor Time for completion of mailer exercise

1) Have someone on the team write down the time just prior to saying go

2) Each “processor” should complete his assigned set of envelopes as quickly as possible.

3) Record the time to complete the assigned mailer job. Time: _____________________

4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?


Activity 2 – Explore Task Decomposition

In this activity, your team will explore how to use a task decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.


Time Required

fifteen minutes


Objective

Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism

through task decomposition.


Materials

1) 16 envelopes per table of 4 to 5 “processors”

2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)

3) Pens or pencils


Setup

1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share

a) Each “processor” will agree with team ahead of time which task or tasks he/she is committed

to accomplishing. Perhaps, one processor is assigned the tasks of Folding, Stuffing, Pretend

Sealing all the envelopes.

b) Another processor, or perhaps even two, are assigned the task of addressing the envelopes

Use same addresses as before

c) Another processor is assigned the role of stamping and mailing the envelopes


Monitor Time for completion of mailer exercise

1) Have someone on the team write down the time just prior to saying go

2) Each “processor” should complete his assigned tasks as quickly as possible.

3) Record the time to complete the assigned mailer job. Time: _____________________

4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?


Activity 3 –Vector Addition exposes race conditions

In this activity, your team will explore how to use a domain decomposition to accomplish the job of (mis)adding a set of numbers which we will call a vector. The activity exposes the problem that occurs when writes to a shared memory variable are not protected by a synchronization construct.


Time Required

fifteen minutes


Objective

Use some index cards, and pencils to explore the concept of a race condition.


Materials

1) 16 numbered index cards.

2) One index card labeled “Shared Sum”

3) 16 extra index cards are labeled “local memory”

4) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index card is labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors

3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.


Execution

1) This is a domain decomposition exercise, where each processor “reads” the value of the “shared sum” and writes a new value to this shared sum – ignoring all the other processors previously written values – this is called a race condition.

Each processor should do these steps as quickly as possible:

a) “read” the value of the “shared sum” and writes that number on your own scratch pad.

b) add one of your “vector” card’s values to the sum on your scratch pad.

c) Immediately cross off the current value on the “shared sum” card and write your own value on the card (probably stomping over someone else’s value).

d) Repeat steps 5a – 5c until you are out of index cards

2) Compare the final value written on the “shared sum” with the known total (46)

3) Did your team compute the correct grand total for the vector sum?


Activity 4 –Vector Addition fixed with critical section

In this activity, your team will explore how a critical section can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also expose the performance penalty that can be taken as a result of critical sections.


Time Required

Fifteen minutes


Objective

Use some index cards, a magic marker & pencils to explore the concept of a critical section


Materials

1) 16 numbered index cards (we call it our vector)

2) 1 index card labeled “Shared Sum”

3) 16 extra index cards are labeled “local memory”

4) Magic marker

5) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.

3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.

4) A Magic maker, known as “critical section”, is placed on the table next to the “shared sum”


Execution

1) We are now trying to synchronize access to shared memory by implementing a critical section. Our goal is to get rid of the race condition we encountered earlier.

2) New rule: “Each processor can only use the magic marker, named critical section, to write values to the “shared sum” index card. And – No processor can do any computations without first acquiring the magic marker, called critical section. As soon as a processor writes a new value to “shared sum” the processor should expeditiously return the capped critical section to the middle of the table”.

3) Each processor should do these steps as quickly as possible:

a) Acquire the critical section! If you failed to acquire the critical section you must wait for the marker to be placed back in the middle of the table.

b) Immediately cross off the current value on the “shared sum” card

c) Add the value of one of your index cards to the “shared sum” use a scratch pad if needed

d) Write the new value for sum on the globally shared “shared sum” card

e) Return the cap to the marker

f) Return the marker to the middle of the table

g) Repeat steps 5a(i) – 5a(v) until you are out of index cards

4) Compare the final value written on the “shared sum” with the known total (46)

5) Did your team compute the correct grand total for the vector sum?

6) What did you observe about how much time you spent idling versus time spent writing or calculating?


Activity 5 – Vector Addition fixed with reduction

In this activity, your team will explore how a reduction (or a partial sums approach) can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also demonstrate the benefit of replacing a critical section with a reduction wherever possible.


Time Required

Fifteen minutes


Objective

Use some index cards, & pencils to explore the concept of a critical section


Materials

1) 16 numbered index cards (we call it our vector)

2) 1 index card labeled “Shared Sum”

3) 4 index card labeled “Partial Sum”

4) 16 extra index cards are labeled “local memory”

5) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.

3) The remaining “Partial sum” cards are divvied up among the processors

4) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.


Execution

We are now trying to synchronize access to shared memory by implementing a reduction – which amounts to a collection of partial sums computed by each processor, followed by a grand total computed by a master thread. Our goal is to get rid of the race condition we encountered earlier, and do the parallel tasks in a more efficient manner.


Every processor will add up his/her own partial sum that represents the total of all the vector

elements assigned to him/her.


One processor will also be named to execute the “master thread” that adds up all the partial sum cards to create a grand total , that the master thread writes this grand total to the “shared sum” card.


1) Each processor should:

a) Add the values from all their assigned vector cards

b) Write the value to a “partial Sum” card

c) Give the “Partial Sum” card to the Processor who will execute the master thread

2) The Master Thread only should:

a) Compute his own partial sum

b) Wait for all other partial sums to arrive

c) Compute the grand total for all partial sums

d) Write the grand total on the “Shared Sum” card


3) Did your team compute the correct grand total for the vector sum?

What did you observe about how much time you spent idling versus time spent writing or calculating?


4) How effective was the reduction strategy at computing the sum in parallel versus the other methods tried?


Review Questions


Question 1: Describe a race condition

Question 2: What does a critical section do?


For more complete information about compiler optimizations, see our Optimization Notice.

Comments

Paul Steinberg (Intel)'s picture

Hi Bob
Thanks for posting this. The BTHS project was a bold (and successful) experiment - I am proud that we helped lead the way. I think that these exercises should be quite helpful to folks and I am anxious to read feedback from members of the Academic Community.
For those interested in further conversations regarding parallelism, computational thinking & secondary education, I invite them to watch our Teach Parallel broadcasts on the subject: “Are High School Whiz Kids Ready to Think Parallel?", “Teaching High School students to Think Parallel: Brooklyn Technical High School Follow-up” and “Developing Computational Thinking in High School Science and Mathematics.” All of these broadcasts are available archived at http://www.intel.com/software/teachparallel.

Paul

ninhngt's picture

Nice exercises ! It is good to see parallelism in action in education.

Raveh Gonen's picture

Parallelism as seen from children viewpoint

Raveh Gonen, Feb 2010

Research Background

This research aims at the target of gathering more information about how children at the ages between six and up understand the basic ideas behind parallelism, parallel processing and the advantages of making things in parallel. The author assume that children know what parallelism is even if they do not know how to spell it or explain it verbally. From a little verbal questionnaire about parallel thinking that was conducted by the author with his own children, at the ages of six, eight and ten, he found out that for the given example question:”What is more beneficial to do when mum and dad has to prepare lunch for the family?”, the six year old girl named Alma was very definite in saying that “First, mum and dad should prepare the first dish, then move together to work on the second dish”. Gabby, aged ten, understood that in order to be able to prepare the lunch on time. it is better that mum will prepare the first dish at the same time dad is preparing the second dish. This was only a taste of how children think parallel.

What the researchers have to say?

During the last three decades, several research papers where published in the subject. From technological and computer science aspects, researchers developed software ideas and software projects, such as: Alice, StarLogo, NetLogo, Scratch and more. All of these experimental environments implicitly allow children to experiment with parallel ideas and emergent behavior. We prefer to differentiate between synchronized parallel activity/thinking (SPA/T) and unsynchronized one.

Research Methodology

We carefully designed a set of social and interactive games that children can play with. Using this set and during the actual play, we assume that we will be able to record, using observation, audio/video recording and interviews, the results of the experiments and some meta information about how children think about parallel activities.

What can we do with such information?

We assume that once we will have more insight about how children think parallel, other researchers will be able to design technological education devices and software that takes in account our results. For example: if the results will show that children below the age six, can not actually enjoy a computer screen with lots of activities going on in parallel, new software for children at these ages may be designed to have less ‘moving’ objects in parallel or be more centered spatially on screen layout.

What games can come to our help?

We propose a set of five ore more games which can be played by a single child or a group of children. Single child game (SCG) may be beneficial when we try to explore how a single child thinks and acts in terms of Parallel Thinking. Group of children game (GCG) may be beneficial when we try to explore how a group thinks about a given task. We chose game because we believe that our audience may feel more comfortable with games instead of dry test, questionnaire or interview.

So, what are the games?

[1] Fill buckets with beach sand: in this game we take a group of four children, four medium size buckets and a pile of beach sand or soil. We then read aloud the game play to the children: “Your task is to fill the four buckets in a minimum time”. Of course, we will put this task more gently, in children language. The children, as a group, may decide to work in parallel where each one fills his own bucket. Or go mix, where all of them fill all four buckets in unison. Another experiment may introduce two buckets to four children or even unevenly dividable set of three buckets to four children. Different groups of children may introduce new ideas and new methods for accomplishing this task.

[2] Copying a Line Drawing Picture: in this game/activity each participant receives a colored pen and a single A4 transparent sheet of paper. The group is introduced to a single line drawing colored image. They, as a group, have the task of imitating this picture. This task can be very surprising: they can select the solution that each one of them draws the whole picture with a single color; they can decide that everyone paints his own lines according to the pen color he owns; they can also understand that transparent A4 sheets can be placed one on top of each other in order to produce the final picture as a composition of four sub-pictures, each colored with a different pen color.

[3] Making a meal together: in this game/activity we take a group of children and introduce to them the task of preparing a meal with three courses: aperitif, main and digestif. They can make the meal using a large set of plastic food ingredients and number of small kitchens. Using this game we can investigate collaboration and planning of a complicated meal; experiment with different ingredients, different number of people eating and different number of children-chef’s.

[4] Earth moving game (optimization): in this game/activity a group of children is asked to plan and execute a plan for moving a soil/earth pile using a set of toy Caterpillar. We limit the game using the following set of rules: a. a big vehicle moves at slow speed and can lift high volumes; b. medium vehicle moves at medium speed and can lift medium volumes; c. small vehicle moves at fast speed and can lift small volumes. Suppose the children have [x,y,z] number of vehicles of each type. What is the best usage of the whole set of three vehicles in order to finish the task at the minimum time. We can also change the optimization function to cost of fuel and et cetera.

[5] Divide and conquer, Binary search in yellow pages book:
[6] Solve arithmetic expression: The group of children is being introduced to a problem of the form: (5-4*2) + (9-8/1) + (3-4*6) + etcetera. In this example we can test whether the children team up to divide this problem into sub problems, solve them independently and then, sum the sub-values into the final solution.

[7] Solving an image puzzle: in this game, the children are being asked to solve a puzzle. They know how to do this since the age of two-three. But, this time, we make a hard limit on time. In this game, we divide the children into 4 sub groups and divide the puzzle sub images into four connectible sub-images. In this game we seek to understand if the team will understand the split-and-join concept for parallel processing.

's picture

for the "1) 16 numbered index cards", what are the number that sum up to 46