Will the OpenMP directives, or the compiler's automatic parallelizing, coexist within an MPI application?
We have a cluster of dual-processor P4 nodes. When we runour application on it, we get near-linear speedup using 1 CPU per node, less benefit using two CPUs. If we use hyperthreading and try to run four processes per node, the performance is worse than for a single processor.
In broad outline, the app has a large number of cells. At each timestep, all the cells are processed, and change their internal state. Relatively infrequently, a particular cell reaches a state in which it sends a message to some other cells, which may be on the same or a different node.
Themessage passingis handled quite well by MPI. However, I'm wondering if I could increase throughputby splitting the repetitive cell processing between virtual and/or real CPUs on a node, without affecting the existing MPI code. Does anyone have experience with similar situations?