| Last Modified On : | November 2, 2009 12:18 PM PST |
Rate |
|
Barnes-Hut Algorithm Implementation in Parallel Programming World
Authors : Ravindra M, V Chaithanya
Faculty mentor : J Sangeetha
Name of the Institution : Peoples Education Society Institute of Technology
Abstract:
This paper presents a parallel processing algorithm whose main goal is to load balancing while reducing the overhead to compute such a partition. In the proposed approach we will take the Barnes-Hut algorithm as a case study.
Our approach partitions the area in large quantity and even if the load imbalance occurred, it is compensated by switching of processor which is already finished its work. Our algorithm works even if a single processor is heavily loaded due to extreme density at one of the areas. Some of the existing methods have balance the load efficiently, but they turn out to be extensively computing. The algorithm presented uses the main strategy of having centre of mass for large areas and hence reducing computation.
Keywords: ORB, costzones, work, depth, Barnes-Hut algorithm, load balancing
Background:
The Barnes-Hut algorithm use to simulate the galaxy and the molecular interaction of n-bodies, etc. If the randomly distributed mass is far enough from the reference object, rather than finding each object to object interaction we will go for object to the cell interaction. Hence the simulation of N-body problem has been reduced from n2 to (n log n).
The basic simulation algorithm:
while (t < tFinal) do
forall 1<= i<= n do
( compute force f(i) on body i )
( numerical integration step ≤ t )
( update velocity and position of body i )
t = t + dt
Sequential Barnes-Hut :
Tree construction:
stepSystem():
T := makeTree(P(1:n))
forall 1<= i<= n do
f(i) := gravCalc(P(i),T)
(update velocities and positions)
Calculating Gravity :
function gravCalc(p,q)
if (“q is a leaf”) then
(return body-body interaction)
else
if (“p is distant enough from q”) then
(return body-cell interaction)
else
forall q’e nonemptyChildren(q) do
accumulate gravCalc(p,q’)
(return accumulated interaction)
end if
end if
The below figure shows the reference object ‘O’ at a distance ‘r’ from the randomly distributed mass ‘M’(cell). We will be first find the centre of mass ‘M’ and then calculate the body-cell interaction.
Body-body interaction : use masses of particles and distance between them. Body-cell interaction : use mass of particle and cell and distance between particle and center of mass of cell.
figure_1:
The main challenge of this algorithm in parallel computing is to reduce the load imbalance in each processor. We have several techniques such as ORB, costzone method to overcome load imbalance.
The ORB method split a initial configuration of the system(graph) into two by associating a scalar quantity Se with each graph node e, which we may call a separator field. By evaluating the median S of the Se, according to whether Se is greater or less than S. The median is chosen as the division so that the number of nodes in each half are automatically equal; the problem is now reduced to that of choosing the field Se so that the communication is minimized.
Problem Statement:
“A load balancing algorithm for the randomly distributed N-body problem.”
Methodology:
While discussing the algorithm we will accept the notion of global memory, accessible by all the processors, that will hold the initial configuration and the global variables that need to be unique and shared by all the processors.
The main idea to increase the efficiency is to reduce body to body interactions. In ORB the recursive bisection will increase the body to body interaction even though load is equally distributed on each processor. It also needs extra calculation of finding Se.
Algorithm:
Input :
Just like ORB we will assume that the initial configuration of N-body system is represented by a quad-tree or oct-tree depending on number of processors.
Method:
1. Each processor is assigned to a subtree.
2. For each existing subtree we have a flag to represent whether it is assigned or not.(We may have oct-tree but only five processors after any processor has finished its work, it has to set unallocated flag so that it can work on other subtree.).If it is assigned and level is not equal to 1
Share the work with current processor.
3. For each subtree we will have an array, representing the levels which may set to three values
0 – unallocated, do the computation at that level
1 – Allocated
2 – Work completed
4. Untill level=1
Perform Barnes-hut computation for the corresponding cell
If level=1
Goto step 2
5. Find the whole systems cell to cell interactions after every dt interval.
Output :
Simulation of N-body systems for each dt interval.
The initial distribution of space and mass as shown in quad-tree representation and the assignment of work to the processors as shown below.
figure_2:
Even though initially P0 is burden with the work, according to our algorithm P3 finishes its computation earlier and shares the work of the P0 .The main advantage is that without having numerous cells(as in the case of ORB),the cell to cell interaction will takes place in large areas (in this case A0 A1 A2 A3).The area A0 corresponds to area allocated to processor P0, A1 to area allocated to the processor P1 and so on as shown in the below figure.L1, L2, L3… corresponds to the levels in the quad tree.
figure_3:
The following figure shows the switching of processors between area and levels of the tree.
figure_4:
By applying work-depth principle analytically we got the following graph
by plotting number of processors versus input size.
figure_5:
Discussion:
From our analysis we found out that the extra calculation done by ORB can be eliminated by using simple polling technique without loosing efficiency which is carried by maximum cell to cell interaction. We have noticed that algorithm efficiency lies in between ORB and costzone but the key difference is costzone takes more communication between processor.
Conclusion:
Ø Our algorithm works for both the conditions one in which equally loaded and the other in which heavily loaded.
Ø It is free from computation of partitions.
Ø Algorithm efficiency lies between the ORB and costzone.
Ø It has less communication between the processor as compared to costzone
References:
.
Acknowledgements:
We would like to thank Asst. Prof. J Sangeetha for her assistance and guidance during our paper presentation work.

| vchaitanya | ||
| ravindrapai34 | ||
jsangeetha
|