Where should I put ANNOTATE_ITERATION_TASK?

Where should I put ANNOTATE_ITERATION_TASK?

I'm using Intel Advisor to analyze my parallel application. I have this code, which is the main loop of my program and where is spent most of the time:

       ANNOTATE_SITE_BEGIN(solve);
       for(size_t i=0; i<wrapperIndexes.size(); i++){
           const int r = wrapperIndexes[i].r;
           const int c = wrapperIndexes[i].c;
           const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
           if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
                (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
              // either positive -> local max. or negative -> local min.
                ANNOTATE_ITERATION_TASK(localizeKeypoint);
                localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
       }
       ANNOTATE_SITE_END();

As you can see, `localizeKeypoint` is where most of the time the loop is spent (if you don't consider the `if` clause). I want to do a Suitability Report to estimate the gain from parallelizing the loop above. So I've written this:

       ANNOTATE_SITE_BEGIN(solve);
       for(size_t i=0; i<wrapperIndexes.size(); i++){
           const int r = wrapperIndexes[i].r;
           const int c = wrapperIndexes[i].c;
           const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
           if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
                (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
              // either positive -> local max. or negative -> local min.
                ANNOTATE_ITERATION_TASK(localizeKeypoint);
                localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
       }
       ANNOTATE_SITE_END();

And the Suitability Report given an excellent 6.69x gain, as you can see here:

However, launching dependencies check, I got this problem message:
In particular see "Missing start task".

In addition, if I place `ANNOTATE_ITERATION_TASK` at the beggining of the loop, like this:

       ANNOTATE_SITE_BEGIN(solve);
       for(size_t i=0; i<wrapperIndexes.size(); i++){
            ANNOTATE_ITERATION_TASK(localizeKeypoint);
           const int r = wrapperIndexes[i].r;
           const int c = wrapperIndexes[i].c;
           const float val = localWrappers[wrapperIndexes[i].i].cur.at<float>(wrapperIndexes[i].r,wrapperIndexes[i].c);
           if ( (val > positiveThreshold && (isMax(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMax(val, localWrappers[wrapperIndexes[i].i].high, r, c))) ||
                (val < negativeThreshold && (isMin(val, localWrappers[wrapperIndexes[i].i].cur, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].low, r, c) && isMin(val, localWrappers[wrapperIndexes[i].i].high, r, c))) )
              // either positive -> local max. or negative -> local min.
                localizeKeypoint(r, c, localCurSigma[wrapperIndexes[i].i], localPixelDistances[wrapperIndexes[i].i], localWrappers[wrapperIndexes[i].i]);
       }
       ANNOTATE_SITE_END();

    
The gain is horrible:

Am I doing something wrong?

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Please do not create multiple threads on the same topic. It doesn't make you more likely to get a response, it just makes things disorganized on our end.

I'm going to close this thread and post a link to here from your other thread so conversation can continue in a single place.

The other thread is here: https://software.intel.com/en-us/forums/intel-advisor-xe/topic/731505

Leave a Comment

Please sign in to add a comment. Not a member? Join today