4,391 Posts served
10,712 Conversations started
- Academic

- Android

- Art, Music, & Animation

- Embedded Computing

- Events

- Game Development

- Graphics & Media

- Intel SW Partner Program

- Intel® AppUp Developer Program

- Manageability & Security

- Mobility

- Open Source

- Parallel Programming

- Performance and Optimization

- Power Efficiency

- Site News & Announcements

- Software Tools

- Association for Computing Machinery TechNews (ACM)
- Go Parallel! (Dr. Dobbs)
- HPCwire (Tabor Communications, Inc.)
- insideHPC (John West)
- Joe Duffy's Weblog (Microsoft)
- Microsoft Parallel Programming Development Center (Microsoft Germany)
- MultiCoreInfo.com
- scalability.org (Scalable Informatics)
- Software Dev Blog (Intel Germany)
- Soft Talk Blog (Intel United Kingdom)
- The Moth (Microsoft)
n-bodies: a parallel TBB solution: parallel code with parallel_invoke: will it run faster?
By Robert Reed (Intel) (33 posts) on July 23, 2010 at 7:19 pm
Earlier in the month I fleshed out a spatially arranged subdivision method I learned from Matteo Frigo but didn’t have time to actually run it and compare against my baselines. And in the meantime my test machine has been regrooved into a Windows 7 box, so my first order of business is to retest my baselines. I reran the serial algorithm a few times and averaged the results. While the raw numbers (using the same compiler and machine but just a different OS) are slightly larger under Windows 7, their magnitude cannot be distinguished in a graph:
Moving forward with the Windows 7 serial numbers, next I did a run with the recursive parallel technique I described last time:
I included the simple parallel run for comparison. Looks like there’s more work to do. My “fast” technique is still not measuring up against either the serial version or the buggy simple parallel version. Time for another hot spot analysis:
Curious. Where most of the work is done is in accAcc(), yet most of time is being spent in rect_interact(), which just splits up rectangles. Maybe I’m being too aggressive at subdividing. Yup, it turns out this is another load balance problem. And the test program comes equipped with a grain size control in the form of a parameter, fastgrain. You can experiment with various thresholds, but using a grain of 16 makes a substantial shift in the hot spots:
Body_interact() is now taking less time than addAcc() (where it had been taking more than four times as much before). And if I collect a set of runs with fastgrain 16, there’s a big change to the performance graph:
These runs start out like the previous parallel algorithms, but by 8 bodies, the curve has already fallen below that of the unsafe, simple parallel method, and by 64 bodies it’s beating the baseline serial numbers. At 4096 bodies this algorithm is now running over 4x faster than the serial version. Success! Could we do even better with a larger grain size? Maybe marginally, but note in the latter hot spot result that TBB internal code including the idle spinner are already taking more time than the code doing real work. It’s possible that a higher threshold will just result in more idle time. Something to look into another day.
Categories: Parallel Programming, Software Tools
Tags: Intel Parallel Amplifier, n-bodies, Open Source, parallel programming, parallelism, parallel_invoke, TBB, Threading Building Blocks
For more complete information about compiler optimizations, see our Optimization Notice.
Comments (2)
| August 9, 2010 10:05 AM PDT
Robert Reed (Intel)
| As a matter of fact, http://software.intel.com/en-us/blogs/2009/08/19/n-bodies-ex..... eek-ahead/ is a previous blog post of mine that contains an attachment with a version of the code that's not too different from what I was experimenting with here. I just checked to be sure it's still there (it is), so you should be able to pull a reasonable copy from there. |
Trackbacks (4)
- n-bodies: a parallel TBB solution: parallel code: balanced recursive parallelism with parallel_invoke – Intel Software Network Blogs
July 23, 2010 9:22 PM PDT -
Twitter Trackbacks for
n-bodies: a parallel TBB solution: parallel code with parallel_invoke: will it run faster? – Intel Software Network Blogs
[intel.com]
on Topsy.com
July 24, 2010 1:24 AM PDT - New ISN Parallel Programming Manager – Lots to Learn, Many Community Members to Meet – Intel Software Network Blogs
July 30, 2010 6:18 PM PDT - New ISN Parallel Programming Manager – Lots to Learn, Many Community Members to Meet - News IT&C
July 30, 2010 9:21 PM PDT







thkarcher
5
could you offer a download link to your code? I'd like to run some experiments myself ...
Thanks!
Thomas