I'm using the intel cluster toolkit on our cluster and so ifort (mpif90). I'm trying to optimize our program : I have tested gprof, oprofile and vtune (compiled with "-g -o2" or "-O3 -xT -Qdyncom"dummyblock" "). All the 3 programs show me the same result: one of our subroutine take 20% of all the time...there is a problem because this subroutine is not so important in the code.
When I take a look into the sources with vtune, I see a surprising thing:
takes a lot of time (but the icst,icen are not sooo big)!!! All the others "do loops" for initialization take a lot of time too...
So I have decided to write this into 2 "Do loops" :
I restart vtune and see that these 2 "do loop" don't take time anymore...But a new subroutine appears in the results: intel_new_memset and takes a lot of time.
How can I interprete these results ? Could someone help me to understand why in this subroutine these "do loops" are a hotsport ?
Thx a lot,