I am running an application known as CESM.
I tried various profiler Intel(itac and also Vtune) and non intel(TAU and others).
However I have not found any suitable profiler which can suggest loops that make for good candidates for offload or vectorization.
the --profile-loops options do not run on parallel application and CESM takes eternity to complete if I try to run it as MPI-serial.
thanks in advance.