If you had 4 days to teach parallel programming… to high school students …

If you had 4 days to teach parallel programming… to high school students … 
Let’s say you were given an invitation to lead a summer camp for high school students – a 4 day long day camp with students coming in from various schools and the only thing they have in common is that they are top notch, technically savvy kids who are avid programmers.  I want this camp to attract the best and brightest and I want it to be an elite kind of opportunity for a select group of advanced students. You know that these high school students may lose interest if you sit them down for 4 days of solid lectures; to mitigate any lack of attention we’ll want a lot of hands on activities. You want these students to come out of the camp with some practical parallel programming skills – demonstrable skills that will be proven by the students actually writing and testing a successful parallel program.  How do you spend the 4 days? What is your recommended scope and sequence of topics?  Should I call this a bootcamp?



My Challenge
These are the questions I am grappling with as I lay out a plan for a pilot project for parallelism at the high school level with a small group of very savvy High school programmers.


First of all, while it’s fine to start off the camp in a purely conceptual way – talking about ideas & concepts for parallelism and even walking through examples of parallelism we see every day in real life – waiting in a line at a store and wishing for more checkers, enlisting the help of friends to paint a room – at some point we have to get around to programming.  So what language will I choose?  Some high school students have programmed in Java, some in C/C++, and some in Java like languages like Alice from Carnegie Mellon University – so what do I choose and why?


I have chosen to teach the camp using C/C++.  The reason for this choice is that there are some excellent diagnostic tools to help diagnose threading issues for C/C++. I feel strongly in having solid analysis tools available for students.  Students should know the rudiments of debugging and have access to a good debugger when they first learn to program because it gives them a feel for the dynamics of a program.  Similarly, when learning to program using parallel techniques, it is essential that a good set of parallel diagnostic tools are available.  Students, especially, need tools to help them spot and avoid race conditions and deadlock conditions and “printf” just are NOT the answer.

Also – for students familiar only in C# or Java, the syntax differences are few and the C/C++ code examples are readily understood by students whose only background are C# or Java.


What is my approach?  First let me say that it is not written in stone yet.  I have based the first two days of the bootcamp around some excellent material provided to us by Professor Michael Quinn which I regard as an introduction to parallel programming. The third day is based on materials provided by Intel engineers and cover OpenMP, Intel® Threading Building Blocks, Intel® Thread Checker & Intel® Thread Profiler.  These materials will provide students a couple of tangible ways to implement parallelism in a C/C++ application. The last day is tentatively earmarked to teach students how to approach parallelizing a demo game written in C/C++. Below is how my planning is shaping up so far.Bootcamp  - Day 1:
Parallel Puzzle de Jour
Use puzzles as an introduction to the students to give them some real-world background in some of the problems that CE majors typically are called on to solve.




Introducing Parallel Programming
Here I will define parallel computing, explain why parallel computing is becoming mainstream and explain why explicit parallel programming is necessary.


Recognizing Potential Parallelism
This is where students recall opportunities for parallelism in the real world

-        working at a restaurant,

-        adding more checkers to alleviate long lines while shopping

-        ways of tackling class assignments in parallel sub-teams

Do a “students as threads” activity to carry out a parallel computation (possibilities: compute a parallel sum, perform a sorting operation, find a maximum value of an array of data)

Identify opportunities for parallelism in code segments and applications


Shared Memory and Threads
Describe the shared-memory model of parallel programming. Explore the differences between the fork/join and the general threads models. Demonstrate how to implement domain and functional decompositions using threads. Investigate whether a variable in a multithreaded program should be shared or private


Short Introduction to OpenMP AKA Implementing Domain Decompositions
Identify “for loops” that can be executed in parallel. Identify blocks of code suitable for parallel execution. Add OpenMP pragmas to programs that have suitable blocks of code or “for loops”.  Explore the new OpenMP Task directive.       Demonstrate the proper use of the “single” and “nowait” directives



Bootcamp  - Day 2:
Parallel Puzzle de Jour
Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.

Confronting Race Conditions

Give practical examples of ways that threads may contend for shared resources. Write an OpenMP program that contains a reduction. Describe what race conditions are and explain how to eliminate them. Define deadlock and explain ways to prevent it.


Implementing Task Decompositions
Describe how threads can be used to implement parallel programs using task decomposition. Implement a task decomposition based on work pools.   Implement a task decomposition in which different threads execute different functions. Examine two case studies: 1) The N Queens Problem, 2) Fancy Web Browser


Improving Parallel Performance
Give reasons why one sequential algorithm may more suitable than another for parallelization. Use loop fusion, loop fission, and loop inversion to create or improve opportunities for parallel execution. Explain the pros and cons of static versus dynamic loop scheduling. Explain Load Balancing. Explain Locality. Explain why it can be difficult both to optimize load balancing and maximize locality



Bootcamp  - Day 3:
Parallel Puzzle de Jour
Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.


Introduction to Threading Building Blocks
Give an overview of Threading Building Blocks. Describe Generic Parallel Algorithms such as parallel_for, parallel_reduce, parallel_sort. Explain theTask Scheduler. Discuss Generic Highly Concurrent Containers. Explore Scalable Memory Allocation. Discuss Low-level Synchronization Primitives


Correcting Threading Errors with Intel® Thread Checker
Discuss the Intel® Thread Checker. Examine race conditions. Exlore the Intel Thread Checker.  Discuss some types of threading errors. Examine library thread-safety


Tuning Threading Code with Intel® Thread Profiler
Discuss Intel® Thread Profiler features. Define Critical Path Analysis. Examine Thread Profiler “data views”. Review common performance issues of multithreaded applications. Examine Load imbalance. Examine Synchronization contention. Describe general optimizations to gain better performance 

Bootcamp  - Day 4:
Parallel Puzzle de Jour
Review any solutions or ideas for tackling yesterdays puzzle and introduce a new puzzle to be solved by the end week.


Threading Games for Performance 
Case Studies - Destroy the Castle Demo game. Give an overview of Multi-Threading in Games. Introduce the Destroy the Castle Demo. Examine Functional Decomposition for this game. Examine Data Decomposition.


Parallel Puzzle Review of Solutions
Student lead discussion of approaches they took towards the parallel puzzle de jour.   

My Challenge to You
So, you’ve seen my proposal.  You know my target audience. You may not agree with the selection of topics, the selection of language, or perhaps the scope or the sequence. So, tell me, how would you design a 4 day bootcamp for technically savvy high school students?

For more complete information about compiler optimizations, see our Optimization Notice.


Gina Bovara (Intel)'s picture

View the Clubhouse Parallel Universe page at http://www.intel.com/software/nyc2009 to see more details. :)

robert c.'s picture

Mandy & Zoresvit

Thank you both for posting !

Mandy - I like your suggestion of hooking the students early with the game threading. The challenge is that the technical knowledge required for this piece - the knowledge to actually thread this code - will take at least a couple days of introduction to concepts such as race conditions, what are tasks, how do you find independent units of work to parallelize, how do you implement parallelism, etc. So from a pedogological persepctive I fell compelled to keep the more in depth technical training aspects of the gaming threading piece for the last day.

However, the spirit of what you are conveying - to hook them early with the motivator - is very compelling.

Perhaps, and I will confer with some others on this point, I could demo a before and after on the first day of bootcamp that shows what they'll be able to do after successful completion of the class. Show how we can speed up this game X times by applying principles, techniques & tools taught in the class

I want them to leave the bootcamp with a "wow" story that embodies what the student was able to accomlish in 3 days.

Thanks again for your feedback!

Bob C

anonymous's picture

Сам сейчас учу паралельное програмирование.Должен признать это нечто захватывающее.Сам я еврей по национальности,бандеролью из Мичигана(программа FLEX) если вы слышали.Но там и близко не было ничего такого из IT технологий.The life is bad there.Но это статья написано отлично должен заметить.превосходно.Если бы автор был учителем я бы подоел к немуц после пар.объязательно.
P.S.Да и в между полупарами тоже

Mandy Mock (Intel)'s picture

I like the proposed flow, but I'd suggest you change the order - introduce the gaming material in the beginning to capture their interests, and build labs in the game code that support each of the topics you laid out.

anonymous's picture


Thank you so much for your feedback. I am most interested in following up to hear more about your program. Unfortunately - the link you provided gave me an access error, so I have not been able to look at the syllabus you have provided more formally for your program.

From what you describe - our programs look very similar. The big difference is the choice between teaching MPI vs Threading building blocks. I think your agenda sounds exciting. Given the student mix I am describing and my funding constraints I am choosing not to focus on cluster or on MPI but strictly with shared memory implementations like OpenMP & TBB. I have taken your recommendation for adding something for cc-NUMA architecture and am now wrestling with relative time allotments for all subjects.

I would enjoy picking up more details of the conversation via email if you are interested. Please e-mail at robert.a.chesebrough@intel.com and mark "High School Parallelism" in the subject line.

Thanks again for your comments.

Robert Chesebrough

anonymous's picture

Hi there,

It was just by accident that I stumbled across this blog post. Actually we are conducting a very similar workshop right now at RWTH Aachen University: PPCES 2009 (Parallel Programming for Computational Engineering and Science: http://www.rz.rwth-aachen.de/go/id/sms/lang/en). Although we are targeting a slightly different audience, we have come up with a pretty similar agenda. We have introductions and talks on advanced topics on OpenMP and MPI, but we skip TBB. I totally agree that tools are a must-have topic, so we spent a lot of time on that.

From my point of view you are missing the architecture aspect. When it comes to shared-memory parallelization, you have to be aware of what a cc-NUMA architecture is and how to do things right on such a machine. In 2004 the AMD Opteron introduced cc-NUMA machines for the x86-based mass market, in 2009 Intel is doing the same with the Nehalem architecture. cc-NUMA offers a lot of potential for scalability if your program is aware of it, but also can prevent you from achieving any speedup at all quite easily if you ignore it. I think this is a very important topic!

Kind regards,

robert c.'s picture


Thanks for your feedback!

I'd like to Introduce you to the community offerings that are easily accessible. There is a “New to Parallel Programming” section on the community - go here: http://software.intel.com/en-us/multi-core/

Much of the material I mentioned has been posted to our academic community website. Of course we refresh the materials on occasion and these materials may not be identical to what I deliver in the bootcamp - I hope to pair these exsiting materials down, in fact, in order to fit within the context of a 4 day bootcamp. You have to be a registered user and is targeted mainly at teachers - however I have provided the link to the content I was speaking of - so that once you are registered you have ready access. http://software.intel.com/en-us/academic/

After you have successfully registered, then you can look at the variety of materials we have available:

What I would recommend to those starting out is the Introduction to Parallel Programming

followed by Multi-core Programming for Academia
These materials cover different ways to implement parallelism using threads. It covers openMP, Threadining Building Blocks, Windows* Threads and also gives some guidance on how to use the Intel Threading tools to identify & troubleshoot any threading problems early in your application lifecycle.

Feel free to post questions about the content to our forum area - we have people monitoring the forums who can answer technical questions.

I hope this answers your questions.

Bob C

anonymous's picture

I am really interested in this since I'm in your target (in High School and highly interested in programming, especially distributed). I already know C++, at school we train for olympiads on various ACM sites.
I have only done some multithreaded programming in Ruby, but mostly used multiple processes with a worker-queue system.

Can you please send me or post the materials used in this course. Or if you have other recommendations I would be interested.

I think your proposal is pretty good. Covering both the basics and more advanced stuff while attracting interest (at least you got mine just from reading the overview). I think it's great to make them work at a real project, this is how I learn most of the things I know: aiming for something and researching/learning what I needed to accomplish it.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.