After the OpenMP: what is the next step?

After the OpenMP: what is the next step?

Аватар пользователя Alexandre Correia

I would like to ask if you could give some guidelines about what I should to do toward improving performance on software's project I'm working in. Let me start by saying that I'm in a very small development team (only 4 persons) which is building an academical finite element algorithm code to simulate metal sheet forming processes. We are using Intel Fortran Compiler for Windows and since we moved from serial version of the code to explore optimization techniques (such as SIMD instructions, MKL Direct Sparse Solver and OpenMP) we have had great gains of performance, for single computers (with dual or quad core architectures).
The next step (I guess) is extend for multiple computers (parallel computing) in this way, I have read & done some experiments with Intel MPI tools, but the results (so far) were totally different I was thought. I thought I could clustering 3 or 4 Windows XP boxes (each one has an Intel dual core processor) to joining them through their Ethernet net cards and one switch and after I compiled the code (it has no changes from the OpenMP version) with Intel MPI tools the application would behave as if the 3 or 4 boxes would be working like only one with 4 CPU dual core. Although, It seems that Intel MPI tools are configured and they are work at each computer (or node) when I run the code the same job is gonna do at all boxes as if I had started individually the same job for each computer at the same time.
Thank you in advance.

4 posts / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя san
Quoting Alexandre Correia
I thought I could clustering 3 or 4 Windows XP boxes (each one has an Intel dual core processor) to joining them through their Ethernet net cards and one switch and after I compiled the code (it has no changes from the OpenMP version) with Intel MPI tools the application would behave as if the 3 or 4 boxes would be working like only one with 4 CPU dual core.

Did you modify your code for MPI?

Аватар пользователя Gergana Slavova (Intel)

Hi Alexandre,

Indeed, going from OpenMP to MPI is not an automatic process (certainly not as simple as adding "MPI pragmas" to your application and recompiling). One of themain ideas behind MPI is that you run the same application for each process in your MPI communication domain, but different chunks of the data are processes by each process (called domain decomposition). Then those processes transfer their partial computations via send/receive calls to all the other processes (since none of them share address space as compared to OpenMP threads). That communication is done via MPI routines, which are provided to you with the Intel tools (e.g. Intel MPI Library).

I grossly oversimplified a lot of the ideas behind MPI, so I would recommend you take a look at Argonne's An Introduction to MPI tutorial. It also has some nice references to a few books which will help you get started.

Good luck in your MPI endeavors! We're certainly all here to support you :)

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com
Аватар пользователя Alexandre Correia

No I do not, I had thought that it's just compiling everything gonna work fine. Now I know I'm wrong.

Зарегистрируйтесь, чтобы оставить комментарий.