I would like to ask if you could give some guidelines about what I should to do toward improving performance on software's project I'm working in. Let me start by saying that I'm in a very small development team (only 4 persons) which is building an academical finite element algorithm code to simulate metal sheet forming processes. We are using Intel Fortran Compiler for Windows and since we moved from serial version of the code to explore optimization techniques (such as SIMD instructions, MKL Direct Sparse Solver and OpenMP) we have had great gains of performance, for single computers (with dual or quad core architectures).
The next step (I guess) is extend for multiple computers (parallel computing) in this way, I have read & done some experiments with Intel MPI tools, but the results (so far) were totally different I was thought. I thought I could clustering 3 or 4 Windows XP boxes (each one has an Intel dual core processor) to joining them through their Ethernet net cards and one switch and after I compiled the code (it has no changes from the OpenMP version) with Intel MPI tools the application would behave as if the 3 or 4 boxes would be working like only one with 4 CPU dual core. Although, It seems that Intel MPI tools are configured and they are work at each computer (or node) when I run the code the same job is gonna do at all boxes as if I had started individually the same job for each computer at the same time.
Thank you in advance.
After the OpenMP: what is the next step?
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.



