Five role playing exercises to introduce parallelism concepts

Since the kickoff of the High School Parallelism bootcamp this summer, I've received several requests for a write up of the five role playing activities we used.


The activities put students in the place of procesor cores and had them perform tasks in parallel. These activities proved to be popular among many of the students at the camp, however, some of the more advanced students did express that they felt the exercsies could seem childish.


My personal observation is that these exercises laid an excellent foundation that was built upon later with actual computer lab activites using OpenMP and Threading Building Blocks.


The activities are best done in groups of 4 or 5 individuals but even two in a single group can.


Without further ado - here is my promised write up:



Thinking Parallel


These role playing activities are designed to get you start thinking in parallel. They will also expose you to some terminology & concepts that we will use throughout the rest of the boot-camp. Many of these exercises were inspired from Chapter two if James Reinders’ book “Intel Threading Building Blocks”.


Objectives

The objectives for the following five exercises are to:

1) Explore domain & task decomposition using a mailer example.

2) Learn about race conditions and two ways to mitigate them.

3) Learn what a critical section does

4) How a reduction can eliminate a race condition.


Activity 1 – Explore Domain Decomposition

In this activity, each member of your team or group will play the role of a processor core, or more accurately, the role of a thread executing code on a core. Your team will explore how to use domain decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.


Time Required

fifteen minutes


Objective

Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism

through domain decomposition.


Materials

1) 16 envelopes per table of 4 to 5 “processors”

2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)

3) Pens or pencils


Setup

1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share

2) Each “processor” (think student) must read over and become familiar with the “code” or instructions she is to execute so that the “code” is committed to memory.

Here is your “code” of instructions – For Domain decomposition – each processor does the same task but uses different data (represented here by different addresses)

a) Fold scrap paper

b) Stuff paper into envelop

c) Pretend to seal envelope (so we can re-use them later)

d) Address an “address label” and fix it to the middle of the envelope

You may use these fictitious addresses if you cannot come up with 16 of your own:




























1600 Daily Planet Ave


Metropolis, NY, 12345


 



Gotham City College


Gotham, IL, 60506


 



Kwikspell University


Briton, UK


 



Know-It-All university


Bullwinkle, AK


 



Waco University


Waco, Texas, 12987


 



Acme Looniverisity


Toonville, CA, 10023


 



Bronto Crane Academy


Rock Vegas, NV, 010101


 



Ferris Bueller

School


Burbank, CA


 



CXVI Caesar Av.


Rome, Italy, XLIV BC


 



Central High School


Central, IN, 17766


 



Bullworth Academy


Missions Blvd, Bermuda


 



Boston Bay College


Dawsons Creek, IL, 10982


 



1600+1/2 Pennsylvania

Avenue NW


Washington, DC 20500


 



Bedrock High School


Little Rock, AR, 56005


 



Watsamata University


Rocky Road, AK


 



Hill Valley Schoolhouse


Hill Valley, CA, 33773


 



e) Fix another post-it (representing a stamp) to the stamp area of the envelope

f) Place envelope in table area designated “the mail box”


Monitor Time for completion of mailer exercise

1) Have someone on the team write down the time just prior to saying go

2) Each “processor” should complete his assigned set of envelopes as quickly as possible.

3) Record the time to complete the assigned mailer job. Time: _____________________

4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?


Activity 2 – Explore Task Decomposition

In this activity, your team will explore how to use a task decomposition to accomplish the job of folding, stuffing, sealing, addressing, stamping & mailing multiple envelopes.


Time Required

fifteen minutes


Objective

Use scrap paper, some empty envelopes, and a pencil to explore the concept of parallelism

through task decomposition.


Materials

1) 16 envelopes per table of 4 to 5 “processors”

2) 32 colored post-it notes (16 to represent stamps, 16 to represent address labels)

3) Pens or pencils


Setup

1) Divvy up the 16 envelopes & colored post-its so that each process gets a roughly equal share

a) Each “processor” will agree with team ahead of time which task or tasks he/she is committed

to accomplishing. Perhaps, one processor is assigned the tasks of Folding, Stuffing, Pretend

Sealing all the envelopes.

b) Another processor, or perhaps even two, are assigned the task of addressing the envelopes

Use same addresses as before

c) Another processor is assigned the role of stamping and mailing the envelopes


Monitor Time for completion of mailer exercise

1) Have someone on the team write down the time just prior to saying go

2) Each “processor” should complete his assigned tasks as quickly as possible.

3) Record the time to complete the assigned mailer job. Time: _____________________

4) Record any observations about the nature of the tasks you do – which ones take longest, which one are fast? Were any processors idle for periods of time?


Activity 3 –Vector Addition exposes race conditions

In this activity, your team will explore how to use a domain decomposition to accomplish the job of (mis)adding a set of numbers which we will call a vector. The activity exposes the problem that occurs when writes to a shared memory variable are not protected by a synchronization construct.


Time Required

fifteen minutes


Objective

Use some index cards, and pencils to explore the concept of a race condition.


Materials

1) 16 numbered index cards.

2) One index card labeled “Shared Sum”

3) 16 extra index cards are labeled “local memory”

4) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index card is labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors

3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.


Execution

1) This is a domain decomposition exercise, where each processor “reads” the value of the “shared sum” and writes a new value to this shared sum – ignoring all the other processors previously written values – this is called a race condition.

Each processor should do these steps as quickly as possible:

a) “read” the value of the “shared sum” and writes that number on your own scratch pad.

b) add one of your “vector” card’s values to the sum on your scratch pad.

c) Immediately cross off the current value on the “shared sum” card and write your own value on the card (probably stomping over someone else’s value).

d) Repeat steps 5a – 5c until you are out of index cards

2) Compare the final value written on the “shared sum” with the known total (46)

3) Did your team compute the correct grand total for the vector sum?


Activity 4 –Vector Addition fixed with critical section

In this activity, your team will explore how a critical section can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also expose the performance penalty that can be taken as a result of critical sections.


Time Required

Fifteen minutes


Objective

Use some index cards, a magic marker & pencils to explore the concept of a critical section


Materials

1) 16 numbered index cards (we call it our vector)

2) 1 index card labeled “Shared Sum”

3) 16 extra index cards are labeled “local memory”

4) Magic marker

5) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.

3) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.

4) A Magic maker, known as “critical section”, is placed on the table next to the “shared sum”


Execution

1) We are now trying to synchronize access to shared memory by implementing a critical section. Our goal is to get rid of the race condition we encountered earlier.

2) New rule: “Each processor can only use the magic marker, named critical section, to write values to the “shared sum” index card. And – No processor can do any computations without first acquiring the magic marker, called critical section. As soon as a processor writes a new value to “shared sum” the processor should expeditiously return the capped critical section to the middle of the table”.

3) Each processor should do these steps as quickly as possible:

a) Acquire the critical section! If you failed to acquire the critical section you must wait for the marker to be placed back in the middle of the table.

b) Immediately cross off the current value on the “shared sum” card

c) Add the value of one of your index cards to the “shared sum” use a scratch pad if needed

d) Write the new value for sum on the globally shared “shared sum” card

e) Return the cap to the marker

f) Return the marker to the middle of the table

g) Repeat steps 5a(i) – 5a(v) until you are out of index cards

4) Compare the final value written on the “shared sum” with the known total (46)

5) Did your team compute the correct grand total for the vector sum?

6) What did you observe about how much time you spent idling versus time spent writing or calculating?


Activity 5 – Vector Addition fixed with reduction

In this activity, your team will explore how a reduction (or a partial sums approach) can be used to guarantee that access to a shared memory region are protected – in other words, that writes to a shared variable are done in an orderly and synchronized fashion. It will also demonstrate the benefit of replacing a critical section with a reduction wherever possible.


Time Required

Fifteen minutes


Objective

Use some index cards, & pencils to explore the concept of a critical section


Materials

1) 16 numbered index cards (we call it our vector)

2) 1 index card labeled “Shared Sum”

3) 4 index card labeled “Partial Sum”

4) 16 extra index cards are labeled “local memory”

5) Pencils


Setup

1) The 16 numbered index cards represent a vector of length 16 elements. These are divvied up roughly equally among all 4 to 5 “processors”

2) One extra index cards labeled “Shared Sum” and has the value 0 written on it and is placed in the middle of the table accessible to all processors.

3) The remaining “Partial sum” cards are divvied up among the processors

4) Several extra index cards are labeled “local memory” and are given to each processor as a scratch pad to add number on.


Execution

We are now trying to synchronize access to shared memory by implementing a reduction – which amounts to a collection of partial sums computed by each processor, followed by a grand total computed by a master thread. Our goal is to get rid of the race condition we encountered earlier, and do the parallel tasks in a more efficient manner.


Every processor will add up his/her own partial sum that represents the total of all the vector

elements assigned to him/her.


One processor will also be named to execute the “master thread” that adds up all the partial sum cards to create a grand total , that the master thread writes this grand total to the “shared sum” card.


1) Each processor should:

a) Add the values from all their assigned vector cards

b) Write the value to a “partial Sum” card

c) Give the “Partial Sum” card to the Processor who will execute the master thread

2) The Master Thread only should:

a) Compute his own partial sum

b) Wait for all other partial sums to arrive

c) Compute the grand total for all partial sums

d) Write the grand total on the “Shared Sum” card


3) Did your team compute the correct grand total for the vector sum?

What did you observe about how much time you spent idling versus time spent writing or calculating?


4) How effective was the reduction strategy at computing the sum in parallel versus the other methods tried?


Review Questions


Question 1: Describe a race condition

Question 2: What does a critical section do?


Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.